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Author’s Preface 


Many papers since my 1948 book have presented numerous appli- 
cations of the ideas sketched in it, particularly about coancestry and 
migration; therefore, in this revised, English edition, I have added a 
few explanatory footnotes, and some formulas about the decrease of 
coancestry with distance. For further information the reader may 
use the new references added to the original bibliography, or my 
recent book [16]. : 

I am grateful to Professor D. M. Yermanos for his many sug- 
gestions and corrections in revising this text and for the care with 
which he has edited and translated it. 


Lyon, 1968 G. MALECOT 


Translator s Foreword 


The need for an English translation of Professor Gustave Malécot’s 
classic work, The Mathematics of Heredity, has been known for some 
time by students of population genetics interested in his approach to 
dealing with problems of population structure. The lack of such a 
translation has curtailed the dissemination of his ideas among 
English-speaking biologists. We are now increasingly concerned 
with population science, yet there are few books in this field. I hope 
that this revised, English edition of Professor Malécot’s book will 
not only enrich the literature now available, but also help bring his 
work the recognition it deserves. 

The Preface by Professor Newton Morton to Probabilités et Héré- 
dités, published in 1966 by the Presses Universitaires de France, 
summarizes well some of the significant aspects of Professor Malé- 
cot’s work, and I have included it here with the kind permission of 
both Professor Morton and the Presses Universitaires de France. 


September 1969 D. M. YERMANOS 


Author's Preface to 
the French Edztion 


The objective of this work is the application of probability theory 
to prove a number of classical formulas as well as a few unpublished 
ones pertaining to genetics and the mathematical theory of evolution. 
Instead of suggesting a unique approach, which would have scemed 
too abstract to the biologist, I have preferred to present various 
methods, each adapted to a concrete problem; once the fundamental 
concepts of mathematical genetics are thus simplified, the founda- 
tions will have been laid for experimentation, which is indispensable, 
and the way will be clear for eventual synthesis. I apologize for the 
imperfections of this first text, and I will accept with interest all 
remarks and criticism that anyone would care to make. In particular, 
I would welcome comments on whatever relates to the theory of 
migration, published here for the first time, and which must be 
matched with experimental data. 

I express my gratitude to Professor G. Darmois and the Institute 
of Statistics in Paris for making this work possible. Also, I express 
my appreciation to Professor L. Blaringhem for his valuable en- 
couragement and to Masson et Cie for the care with which they 
have published this book. 


Lyon, 1948 G. MALECOT 
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Preface to 
Probabilités et Hérédité 


The probabilistic theory of genetic relationship and covariance devel- 
oped by Malécot has been propagated by disciples in other countries, 
notably Crow in the United States, Yasuda and Kimura in Japan, 
and Falconer in Great Britain, and is now universally accepted. The 
application of his results for isolation by distance, begun by Lamotte 
with Cepea and continued by Yasuda in man, promises to reveal 
population structure and the forces that have acted on major genes. 

Malécot’s insight is the more remarkable because Fisher, Haldane, 
and Wright, the great figures of population genetics in the older 
generation, used correlation analysis and did not mind that the 
derivation of correlations from probabilities is far easier than the 
reverse passage. By mid-century a reaction was inevitable. Major 
genes for blood groups, serum proteins, and other polymorphisms, 
as well as lethals and detrimentals, have become the heart of popu- 
lation genetics, and for them correlation partitions are inappropriate. 
At the same time, the invalidity of models of population structure 
based on genetic “islands” and “neighborhoods”? has become 
apparent. 


From Probabilités et Hérédité by Gustave Malécot, Presses Universitaires de 
France, 1966. 
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A probabilistic approach was begun by Cotterman, who in 1941 
set forth the conditional probabilities for many kinds of relationship. 
His work had little impact, however, largely because the material of 
his thesis was published in summary, but also because his formulation 
was designed for nearly panmictic human populations and did not 
reveal the full power of probability methods. 

Malécot’s thesis of 1939 followed the classical approach of Fisher 
and Wright. His book of 1948, however, contains in a brief 63 pages 
a profoundly original treatment of relationship, covariance, and 
population structure in terms of probability theory. Every derivation 
began with the genotypic probabilities for a single locus, and with 
astonishing clarity the most complicated properties of Mendelian 
populations were revealed. Malécot identified Wright’s coefficient of 
inbreeding, one of the great unifying concepts of mathematical 
biology, as the probability that uniting gametes are identical by 
descent, and introduced the more general coefficient of kinship 
(parente) to measure relationship of individuals possibly separated 
in time, space, or by other barriers, from which mating pairs are not 
randomly drawn. He replaced Wright’s bewildering diversity of 
inbreeding coefficients relative to different subpopulations by one 
absolute measure of isolation by distance, the relation between the 
mean coefficient of kinship or inbreeding and the marital distance 
between birth places of potential mates. This led Wright to reexamine 
his results and conclude that “‘neighborhood size,” on which his 
analysis of population structure is based, is almost independent of 
Malécot’s basic relation between consanguinity and distance. There 
seems little doubt that research on population structure in the 
foreseeable future will follow the direction set by Malecot. 

His later work on population structure was mathematically diffi- 
cult, and publications in the Annales de I’ Université de Lyon did not 
receive immediate recognition. As recently as 1964 Kimura and 
Weiss rediscovered the formula for two-dimensional isolation by 
distance which had been published by Malécot (1959 and earlier), 
and believed that their result was new. This book is therefore doubly 
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welcome, as an orderly presentation of principles and as vindication 
of the priority of a great French savant who has transformed 
population genetics. 


Professor of Genetics NEWTON E. MorTON 
University of Hawaii 


Honolulu, 1966 


THE MATHEMATICS OF HEREDITY 


Chapter 1 


The Mendelian Lottery 


1.1 HEREDITY AND THE LAWS OF MENDEL 


Let us recall the laws of Mendel, taking the four o’clock (Mirabilis 
Jalapa) as an example. If we cross a white-flowered plant with a 
red-flowered one, we obtain only pink-flowered plants. But if we 
cross pink-flowered plants among themselves, we obtain progeny 
1/4 of which, on the average, have white flowers, 1/2 pink flowers 
and 1/4 red flowers. The traits of the grandparents reappear. This 
is the phenomenon of Mendelian disjunction, or segregation. It can 
be explained by postulating that flower color in the four o’clock is 
determined by a pair of hereditary units or factors, each of which 
can appear in one or the other of two states or genes, which we will 
designate by A or a. Thus, an individual could carry the pairs 
AA, and so have red flowers; Aa, and have pink flowers; or aa, and 
have white flowers. The three states in which the pair can appear 
are called genotypes or zygotes. AA and aa are the homozygotes; 
Aa is the heterozygote. 

The outcome of the cross can be interpreted by the following 
mechanism. The pair of factors of each plant resulting from the 
cross, i.e., of each “offspring,” is obtained by drawing, at random, 


AA aa 
Aa 


Aa Aa 
[PREM FIGURE 1. 

one of the two factors of the father and one of the two factors of the 
mother (see Fig. 1). The cross of an AA with an aa gives only Aa 
(first generation), but the cross of Aa genotypes among themselves 
gives: AA with a probability of 1/4 (of drawing an A from both 
parents); Aa, 1/2; and aa, 1/4. This interpretation agrees well with 
the many observations of frequencies for individuals in the second 
generation. 

The laws of Mendel explain remarkably well all the phenomena 
of heredity, and it can be said that, with a few rare exceptions, all 
heredity follows the Mendelian process. One has to admit, however, 
that genes can also act in a way different from the one just described. 

(A) Let us consider the example given by Mendel of the cross 
between peas with round seeds and peas with wrinkled seeds. In the 
first generation, we obtain only peas with round seeds; when we 
cross these peas among themselves, 3/4 of their progeny have round 
seeds and 1/4 have wrinkled seeds. This result fits the previous 
scheme perfectly if we postulate that both AA and Aa have round 
seeds, and that only aa has wrinkled seeds. 

In this case, the heterozygote Aa has the same external appearance 
as an AA homozygote, from which it cannot be distinguished except 
by the characteristics of its progeny; that is, we must distinguish 
the genotype, or hereditary constitution, from the phenotype, or 
external appearance. Here the three genotypes give only two pheno- 
types. The gene A is dominant over the gene @, 1.e., @ is recessive, 
and the heterozygote exhibits the same phenotype as the dominant 
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homozygote. Dominance can be incomplete, so that the hetero- 
zygote is closer to one of the homozygotes, but is nevertheless 
distinct. 

(B) Characteristics determined by several pairs of factors are 
called multifactorial. For example, the shape of a rooster’s comb 
depends on three pairs of factors: the first pair with genes C (presence 
of comb) dominant over c (rudimentary comb); the second pair R 
(rose) dominant over r (single); the third D (double) dominant over d 
(single). As a result of dominance, the genotypes CCRRdd, CcRRdd, 
CcRrdd, and CCRrdd produce the same phenotype, the rose comb, 
but CCrrdd and Ccrrdd produce a single comb, and cerrDD and 
ccrrDd produce the pea comb (double rudimentary comb). 

The study of crosses shows that segregation of different pairs 
takes place independently. For example, crossing a breed of chickens 
with a rose comb produced by the double heterozygote CcRrdd 
with a breed having a pea comb of genotype ccrrDD produces 
progeny which all have the pair Dd, but which have either Cc or cc, 
and either Rr or rr, each pair having a probability of 1/2. Therefore, 
since these two pairs undergo segregation independently, 1/4 of the 
genotypes of the progeny will be, on the average, CcRrDd, 1/4 will 
be ccRrDd, 1/4 CcrrDd, and 1/4 cerr Dd. 

(C) In the three examples so far, the characteristics observed con- 
stituted a discontinuous series. Karl Pearson, who was, with Frances 
Galton, the founder of biometry, distinguished from this ‘“‘alterna- 
tive’ heredity the “continuous” or ‘“‘blending’” heredity, as, for 
example, that of stature or skin color in the human species. If one 
observes enough children from a given couple, one finds that the 
statures of the children are grouped around a mean value which 
depends on the statures of the two parents, and conform to a bell- 
shaped curve, with extreme deviations being rare but possible. It 
seems that there is a blending of the parental characters, complicated 
by fluctuations. In the same way, when two mulattoes marry, their 
children can vary greatly in skin color; although most of the children 
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will have skin color more or less like that of the parents, from time 
to time a completely black or completely white genotype also occurs. * 
All these results are perfectly explained by Mendel’s laws, on the 
basis that stature or skin color results from the accumulated effects 
of a large number, n, of Mendelian factors which disjoin independ- 
ently. To illustrate the point, if, in each pair of factors, one of the 
two possible genes adds 1 mm to, and the other subtracts 1 mm 
from, the mean stature, and if one crosses two individuals in whom 
all pairs are heterozygous, A1q), Ad, . . . An@n, in the offspring each 
pair of factors will have the probabilities 1/4, 1/2, and 1/4 of being 
in the states 4;A;, A.a;, and a,a;, and of contributing to the stature 
2mm, 0mm, and —2 mn, respectively, as in two independent ran- 
dom choices. 

If the n different pairs are stochastically independent, the stature 
of a child will be nothing but the sum of gains and losses in 2n inde- 
pendent series of random choices. This sum, as we know, for an 
indefinitely large n, follows Gauss’s law of probability, which fits 
the experimental bell-shaped curve. We shall see that the same is true 
whether one suppose dominance to be generally complete or gener- 
ally incomplete in each pair or suppose different contributions for 
different pairs. This general scheme of multifactorial Mendelian 
heredity will be developed in detail in the second chapter, and will 
be shown to explain the results of biometry as discovered by Galton 
and Pearson [5, 17, 18].7 


1.2 THE CHROMOSOMES 


The physiological basis of Mendelian heredity was discovered in the 
small rods or chromosomes that are constituents of the nucleus of the 
reproductive cells or gametes. These chromosomes have a fixed 
number, n, in each species (23 in man, 4 in drosophila) and some- 


* To be precise, one black and one white child will occur in every fourteen 
children, on the average. D. M. Y. 
+ Bold numbers in brackets refer to the literature cited at the end of the book. 


FIGURE 2. 


times exhibit differences among themselves which make it possible 
to recognize in two different gametes the homologous chromosomes 
(see Fig. 2 for drosophila). When a paternal gamete unites with a 
maternal gamete, the fertilized egg has n pairs of homologous 
chromosomes. This chromosomal constitution persists in all the 
cells that the egg produces by division and, finally, in all the cells 
of the adult individual except the reproductive cells or gametes; 
the latter are produced by a division or disjunction which allows 
only one chromosome of each pair to be included in each gamete, 
this chromosome being taken at random from the two. Union of 
these gametes, at random, with the gametes of the other parent 
produces the individuals of the following generation. 

The laws of Mendel are explained, therefore, by postulating that 
the two factors of a pair are carried by two homologous chromo- 
somes. Two heterozygous parents, Aa, will each form gametes one 
half of which are A and one half a; this is disjunction. Random union 
of these gametes will produce offspring in the ratio 1/4 AA, 1/2 Aa, 
and 1/4 aa. 
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A difficulty appears, however; the different pairs of factors can 
be independent only if each pair is carried by a different pair of 
chromosomes, each of the latter being expected to disjoin independ- 
ently. Factors carried by the same pair of chromosomes should be 


completely linked. However, given a genotype in which AB ab repre- 
sents two pairs of factors linked onto two chromosomes, if this 


genotype is crossed with a homozygote ab ab, the genotypes actually 
found in the progeny are ABab and abab, each with a frequency of 
(1 — r)/2, and Abab and aBab, each with a frequency of r/2, where 
r is, in general, a small positive number, but not zero as it would be 
if linkage were complete. These ratios can be explained only by 
postulating that the chromosomes break and exchange homologous 
sections before disjunction. The four types of gametes formed have 
the given frequencies because of this exchange, whose probability 
of occurring is r. This is the phenomenon of ‘“‘crossing over.” A 
study of this phenomenon leads to the assumption that each factor 
is localized at a specific point on the chromosome; this point is its 
locus (plural, Joci). For two factors found on the same chromosome, 
it is evident that the farther apart they are located on the chromo- 
some, the greater will be the probability r of their crossing over. 
This phenomenon made it possible to map the loci of the different 
factors for the four pairs of chromosomes of drosophila. We shall 
see in the following chapter that, because of crossing over, linkage 
of factors on the same chromosome does not prevent them, in the 
long run, from being as reshuffled as if they were independent. 

Sex is determined by a pair of chromosomes, two X chromosomes 
for the female, an X and a Y chromosome for the male (except in 
lepidoptera and birds), called heterosomes; the other chromosomes 
are called autosomes. The factors carried on the heterosomes are 
called ‘‘sex-linked”’; we know primarily those carried on the X chro- 
mosome. These other factors are never masked in males, but can 
be in females (as with daltonism and hemophilia). 

Whatever the physiological process by which genes affect the 
development of individuals, it is to be expected that a single pair 
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of genes can affect several characteristics. For example, the recessive 
gene for albinism produces, in the homozygous recessive aa, both 
white hair and red eyes (due to lack of pigment). On the other hand, 
a single characteristic, such as stature, can be influenced by many 
pairs of genes. Although, strictly speaking, each characteristic de- 
pends on the entire gene complement—on the total “genetic con- 
stitution’’—each characteristic is in fact influenced appreciably by 
only one pair of genes, which made it possible for Mendel to deduce 
his laws. 

We have so far been tacitly assuming that the genes occupying a 
specific locus can appear in only two different states, represented 
by A and a, which we call allelic genes. In reality, they can also have 
multiple states A’, A’, A’’, ...A™, which we call multiallelism. 
There are n homozygotes and ,C,; = we) heterozygotes.* The 
origin of allelic genes is to be traced back to the phenomenon of 
mutation, which appears to be (at least according to present observa- 
tions, since one cannot guess what took place in paleontological 
times) the only inheritable way in which living organisms can be 
modified and, therefore, the only one that affects the evolution of 
species. Mutation is an abrupt change affecting one of two homol- 
ogous loci in an individual; this change is, therefore, transmitted 
to one half its gametes. Thus, in a population of homozygous 
AA individuals, which are indistinguishable in terms of this pair of 
genes, there may appear an Aa heterozygote, which, even though 
A is dominant, can be identified by its progeny. Mutation produced 
the new gene a, which is allelic to the old gene A. Repeated muta- 
tions affecting the same locus can continue to create the same gene a 
(recurrent mutation), or cause gene a to revert to A (reverse muta- 
tion), or create other alleles (multiallelism). 


* For example, the four blood groups in man are determined by three alleles, 
A, B, and O, A and B being dominant over O, which give the four phenotypes 
A (genotype AA or AO), B (BB or BO), AB (universal recipient), and O (OO) 
(universal donor). 
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1.3 RESEMBLANCE BETWEEN 
RELATED INDIVIDUALS 


Two individuals in a population are related if they have one or more 
common ancestors. If they do, their genetic difference must be 
smaller, on the average, than that between two individuals taken at 
random, because some of the genes of the first two are derived from 
the corresponding genes of the common ancestor. Disregarding 
mutations, these genes cannot be different, whereas they often could 
be in unrelated individuals. 

For precision in terminology, let us distinguish between factors, 
genes, and loci. Let us call genes the different states in which each 
factor can appear without regard to the individual in which they are 
observed. Two genes corresponding to the same factor and observed 
either in the same individual or in two different individuals will be 
called identical or different, depending on whether they appear in 
the same state, for example, A, or in two allelic states, for example, 
A and a. However, two /oci will be called identical only if they were 
derived by Mendelian descent from the same locus of the same 
common ancestor; otherwise they will be called different. Two 
identical loci are by necessity occupied by two identical genes, if 
there is no mutation, but two different loci can be occupied by 
either two identical or two different genes. 

An individual, /, has two parents, four grandparents, .. . 2” an- 
cestors of order n. A locus of J has a probability of 1/2 of being 
derived from the father, 1/2 from the mother, 1/4 from each of 
the grandparents, 1/2” from each ancestor of order n, along a given 
chain of descent. An ancestor of J can be linked to it by several 
chains of descent; for example, J in Figure 3 is a multiple ancestor, 
and could even be an ancestor of different order in different chains. 

We will designate as the coefficient of coancestry, fir, of two 
individuals J and L the probability that two homologous loci, one 
from J and the other from L, are identical, i.e., are descended from 
the same locus. The complementary probability, 1 — fiz, is the prob- 


FIGURE 3. 


ability that these two loci come from unrelated ancestors, i.e., that 
they are stochastically independent, since knowing the gene which 
occupies one locus does not provide any information about the gene 
which occupies the other; these two genes can be identical or differ- 
ent, but their probabilities are independent. 

We shall designate as the coefficient of inbreeding, fu, of an indi- 
vidual M the probability that its two homologous loci are identical. 
Since one locus is derived from its father and the other from its 
mother, fi is nothing but the coefficient of coancestry of its two 
parents. 

Let us evaluate the coefficient of coancestry, fiz, of two indi- 
viduals, J and L. It differs from zero only where J and L have one 
or more common ancestors, Ji, Jo, J3, etc., which we will assume 
they have. Let us suppose at first that there is only one common 
ancestor, J, of order n for J and of order p for L along two distinct 
chains of descent, which together constitute a chain of coancestry 
linking J and L. 

The probability that one locus of J and one homologous locus of 
L are both derived from J is (1/2)"*”. If they are both derived from J, 
they have a probability of 1/2 of being derived from the same locus 
of J and a probability of 1/2 of being derived from different loci; 
if they are from different loci, the probability that they will be 
identical is f;. From this, fr, = (1/2)"+?(1 + f7)/2. In particular, 
the coefficient of coancestry of an individual and an ancestor of 
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order n is given by letting p = 0; the coefficient of coancestry of 
an individual with itself is given by letting n = p = 0. 

Let us now consider the general case, in which J and L are con- 
nected by any number of chains of coancestry, each chain being the 
combination of two chains of descent leading from J and from L to 
a common ancestor J; and having no other common point except J;; 
two chains of coancestry are considered distinct, even if they have 
links in common, provided they differ by at least one link. Since the 
transmission of identical loci along a specified chain of relationship 
excludes their transmission along any other, the principle of total 
probability gives 


fin = 2L/2)stP(l + fi) /2. 


The sum 2 extends over all distinct chains of relationship connecting 
I and L; the ith chain has n; + p,; links ascending from J and from L 
to the common ancestor J;, whose coefficient of inbreeding is /7,. 

For example, if we assume that all chains of relationship are 
shown in Figure 4 and that the ancestors A and B are not related, 
there are the following distinct chains and respective contributions 
to the coefficient fer: GCF = 1/8; GEF = 5/32; GCAEF = 1/32; 


| 
| 


Q 
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FCAEG = 1/32; GCA DEF = 1/64; FCADEG = 1/64; GCBDEF = 
1/64; FCBDEG = 1/64. Therefore, fer = 13/32. 

Let us suppose now that the loci considered may have an average 
rate of mutation, u, per generation. The probability that a locus of 
an individual reproduces without modification the parental locus 
from which it was derived is 1 — uw; therefore, the probability that 
this locus may be transmitted without modification along a specified 
chain of descent having n links is [(1 — u)/2]”. The coefficient of 
coancestry of two related individuals then becomes 


3) (- 2 ia L + fay, 
2 2 


The correction thus introduced is insignificant for close relatives, 
since u is extremely small; it becomes important, as we shall see, 
only when very distant ancestors are involved. 


Chapter i: 


Correlations Between 
Relatives in an Isogamous 
Stationary Population 


2.1 PROBABILITIES OF GENES 
AND GENOTYPES 


Let us classify the individuals of a population F according to the 
states of a specified pair of factors. Let us first suppose that there 
are only two alleles, A and a, and, therefore, the three genotypes 
AA, Aa, aa, with the respective frequencies P, 2Q, R, where 
P+2Q0+ R= 1. Let us define the frequencies of genes A and a 
as the quantities p and g, where p= P+ Q0, q=Q+R, and 
pang =. 

These quantities are the probabilities that a gene taken at random 
from any individual in population F is in state A or a, respectively. 
In each individual J of the population F, each of the two homologous 
loci will be occupied by gene A or a with the probability p or q. 
However, there will generally be a relationship between the proba- 
bilities of these two loci, that is, a correlation between the states of 
these two loci, because knowing which gene occupies one of these 
loci affects the probabilities for the other locus. In fact, the two 
parents of the preceding generation from which these loci are 
descended could have been selected according to their relationship 
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(consanguinity) or according to their resemblance (assortative mat- 
ing), or they could have left only selected descendants because of 
differential fecundity; if so, any information on the genotype of one 
parent modifies the probabilities for the other. In this chapter we 
shall deal with the following two cases. 

(A) The parents mate at random; the probability of finding a 
mate is the same for all individuals; and fecundity is the same for 
all couples. This is “random mating,”’ panmixia. In this case, know- 
ing the gene which occupies one of two loci of J gives us no informa- 
tion about the other; the states of these two loci are stochastically 
independent. Therefore, J may have one of the three genotypes 
AA, Aa, aa, with probabilities p?, 2pq, q?. If the population is large, 
the observed frequencies P, 2Q, R, must be close to these quantities. 
To prove this, it is sufficient to show that Q? — PR is close to zero 
(Hardy’s law), because we can set P = p?+ , 20 = 2pq — 2p, 
R = q*+ », and since we have sett P+ 0 = p, 0+ R =gq, and 
p+q= 1, we have \ = pu = »; therefore, 


O? — PR = (pg — h)? = = NG bh) = A, 


which equals 0 only when \ = 0. Natural populations actually exist 
in which Hardy’s law is confirmed, e.g., the population of coleoptera 
Dermestes vulpinus observed by Philip [19] (the pair of factors 
studied determines wing color). We shall see that there are such 
populations in the human species, too, for blood groups. 

(B) The parents mate according to their consanguinity without 
considering their genotypes or resemblance; the probability of finding 
a mate is the same for all individuals; and all couples have the same 
fecundity. This is pure consanguinity or isogamy. Therefore, a locus 
in any individual, whether derived from a consanguineous cross or 
not, has always the same probabilities, p and g, of carrying the genes 
A or a; furthermore, for any individual J whose coefficient of 
inbreeding f7 = f, is known, the two homologous loci have, as we 
have seen, the probability f of being identical and the probability 
1 — f of being stochastically independent; therefore, the three geno- 
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types will occur with the probabilities, fp + (1 — f)p? = p? + fp, 
211 — f)pg, and fg + (1 — f)q? = q? + fpq. (For example, to have 
the first genotype, the two loci should be identical and one of them 
should be A, or they should be independent and both of them should 
be A). 

Consanguinity, therefore, causes an appreciable increase in the 
probability of homozygotes and a decrease in the probability of 
heterozygotes. This fact explains the danger of marriages between 
related persons; latent defects in the human species are generally 
determined by rare recessive genes, and appear only in homozygous 
recessives aa. If g is the frequency, presumably low, of a defective 
gene a, the probability that an individual J carries the defect, 1.e., 
that it is of the genotype aa, will be equal to g? (which is extremely 
low) if the parents of J are not related; but this probability increases 
to q? + fpq ~ fq if fr is rather high. For example, a defect brought 
about by a gene with frequency g = 10~4 will appear with the prob- 
ability 10-*° in an offspring without inbreeding, but with the proba- 
bility 10-4/16 in an offspring of first cousins (f = 1/16).* The 
danger is doubled for double first cousins (f = 1/8). It is thus 
unreasonable to tolerate marriage between double first cousins and 
between uncle and niece, and to forbid marriage between half-sibs, 
which presents exactly the same danger (f = 1/8) [6]. 

Let us consider now the more general case of multiallelism. Sup- 
pose that the allelic genes A; have the frequencies p; (2p; = 1). 

(1) With random mating, the probabilities of different genotypes 
are pi for an A;A; homozygote, 2p:p; for an A;A; heterozygote, 
these probabilities being coefficients in the expansion of (Zp,t;)?. 
These formulas approximate well the frequencies of blood groups 
in a homogeneous population (p? + 2pr, q? + 2qr, 2pq, r). 

(2) In the more general case of isogamy, these probabilities, for 
an individual with coefficient of inbreeding f, are, respectively, 


* In effect, it has been found that 1/2 of the cases of Friedreich’s ataxia, as well 
as 1/3 of the cases of albinism, are derived from marriages between relatives. 
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foi + (lL —f)pi = pi + fol — pi) and 21 — f)pip;, 
these being coefficients in the expansion of f2p,t? + (1 — f)(Zpit,). 


2.2 THE DISTRIBUTION OF FACTORS 
IN AN ISOGAMOUS POPULATION 


Let us call “isogamous” a population, F, derived from parents 
chosen either at random or because of their coancestry (but exclud- 
ing all homogamy), and in which all pairs have the same fecundity. 
Assume that the proportion of couples having a coefficient of 
coancestry f; is w; (the proportion w» corresponds to random mating, 
with fo = 0); w; is, therefore, the frequency of individuals in the 
population with inbreeding coefficient f;, and 2;w; = 1. We have 
seen that the probabilities of the alleles A and a (assuming only two 
of them for simplicity) are the same among these individuals as in 
the total population, e.g., p and q. 

The probabilities of the three genotypes in the entire population, 
and, accordingly, their frequencies, P, 20, R, if the population is 
large, are 


Zwip(p+ fq),  Z2wipqgli—fi),  Zwgqt fip), 


which can also be written as 


P(p+aqg), 2pgl—a), 4qq-+ap), 


setting a = Yw;fi; a is the mean inbreeding coefficient of the popula- 
tion, the mean of the coefficients of its individuals. It had been 
introduced a priori by Bernstein [2] to measure deviations from 
panmixia. His approximate evaluation has been tested on some 
human populations with the help of state census data on con- 
sanguineous marriages. In general, this coefficient is small: in a 
rural Austrian population, Reutlinger found a to be 0.6 per cent; 
in a Jewish population Orel found a to be a little over 1 per cent. 
These estimates, however, are probably much below the actual 
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coefficient, because the distant relationships, which are overlooked, 
play as important a role as the close ones. 

After considering the segregation of one pair of factors, let us 
now study the simultaneous segregation in the population F of two 
pairs of factors occupied by genes having the states A; and B; with 
probabilities p; and x,, respectively. 

An individual J taken at random in F results from the union of 
two gametes from the preceding generation, F’. Let us call Pj, the 
probability that any gamete I’ coming from generation F’ has in its 
chromosomes the genes A; and B;, and P;; the probability that the 
same will be true for any gamete I from F, i.e., for a gamete pro- 
duced by /; and let us find the relation between P;; and Pj;. When 
the gamete I’ produced by / has the genes A; and B,, either both are 
derived from the same gamete I’ or each came from one of the two 
gametes I’ which made up J. These two possibilities each have the 
probability 1/2, if the two genes are found on two different chromo- 
somes, because of independent segregation; but they have the prob- 
abilities 1 — r and r if the two genes are located on the same 
chromosome, because of “crossing over.” The first possibility will 
be included with the second when r = 1/2. Then we have: 
P;; = (1 — r)Pi; + raij;, where 7;; is the probability that, in genera- 
tion F, a gamete carrying A; may unite with a gamete carrying Bj. 

Different pairs, therefore, are not generally stochastically inde- 
pendent, since their distribution depends on the distribution in 
preceding generations, i.e., on an initial distribution which might 
be arbitrary. It will be shown, however, that there is an “‘asymptotic 
independence” under the following hypotheses. 

(1) The population considered is very large, so that frequencies 
and probabilities in each generation are essentially equal. 

(2) The population is isogamous, so that, as we have seen, no gene 
is favored; therefore, in each generation, the gene probabilities will 
remain equal to their frequencies in the preceding generation. As a 
result, the frequencies p; will remain constant over generations. 
These will be the characteristic constants of the population and of 
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the system of alleles considered. From these, one can derive the 
probabilities of the three genotypes for individuals with coefficient 
of inbreeding f, or their frequencies if they are sufficiently numerous. 

(3) The mating system adopted, although it implies a relationship 
between the two gametes that unite, leaves their probabilities of 
carrying different genes independent. This consequence, evident for 
panmixia, is not always valid in crosses between relatives, e.g., when 
the population is divided into groups between which crosses are 
impossible. It can be shown, for example, that it applies to brother- 
sister matings, if all individuals in each generation are brothers and 
sisters of one family; if not, the population would be distributed 
into several groups, and differences between genes existing in these 
groups would continue to exist indefinitely. Let us assume, therefore, 
that the mating system chosen is such that it leaves independent the 
probabilities of one uniting gamete carrying gene A;, the other 
gene B;. Then the probability 7;; of the union of a gamete carry- 
ing A; with a gamete carrying B; will be constant and equal to p;x;. 
The above recurrence equation may be written as 


Py — pxy = UI — (Pi; — pix)). 


If, therefore, the P;; of one generation is equal to p,x;, it will always 
remain equal in the following generation; we say then that the pop- 
ulation is stationary, and we note that the genes of the different 
pairs are stochastically independent. In a nonstationary population, 
Pi; — pix; —> 0 as (1 — r)” —~ 0 when the number, x, of genera- 
tions tends to infinity, and the population tends to become station- 
ary; let us assume in the remainder of the chapter that this state of 
equilibrium has been attained, and, in particular, that there is 
stochastic independence of the different factors. 


2.3 RANDOM MENDELIAN VARIABLES IN 
AN ISOGAMOUS STATIONARY POPULATION 


Let us consider a specific trait, e.g., stature, of the individuals that 
make up the population; this trait can be either quantitative and 
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measurable, or qualitative and arbitrarily assigned to values on a 
numerical scale. Call y the numerical value thus attributed to the 
trait in each individual. For an individual J taken at random from 
the population, y is a random variable. We shall regard y as being 
the sum of a random variable, x, which represents the influence of 
the genetic constitution of J on the trait considered, and of another 
random variable, z, which represents the influence of chance and 
environment on the development of this trait, z being stochastically 
independent of x. Consider x the sum of contributions made to the 
trait by a certain number of pairs of factors. For example, the 
contribution 3C of one of its pairs will be equal to i, 7, or k, depending 
on whether this pair has the state 4A, Aa, or aa, whose probabilities 
are p? + fpq, 211 — f)pq, and q(q + fp), respectively, where p and q 
are the frequencies of A and a, and f is the inbreeding coefficient 
of J. 5C will be called the genotypic random variable associated with 
the trait and with the pair of factors considered.* If one of the 
alleles has complete dominance, j = i or j = k. If there is no dom- 
inance, i.e., when the heterozygote is exactly intermediate between 
the two homozygotes, j = (i at k)/2, or one can let i = 24,j = s+ 1, 
and k = 2s; one can readily verify that the three-valued random 
variable 3¢ is the sum of the two-valued random variables H and H’, 
each of which has the value ¢ or s with probabilities p or g, and 
which have the probability f of being identical and the probability 
(1 — f) of being independent. H and H’, which represent the respec- 
tive states of the two loci of the pair, will be referred to as genic 
random variables. If there is dominance (complete or incomplete), 
we can still keep the random variables H and H’ by taking appro- 
priate values for s and f¢, and letting 3c = H+ H’-+ d, the com- 
ponent of dominance, d, being equal to i — 2t, 7 — s — t, ork — 2s, 
according to whether H + H’ is equal to 2t, s + t, or 2s (the most 


* Unless otherwise specified, the sampling unit on which this random variable 
depends is an individual taken at random from among those having inbreeding 
coefficient f- 
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convenient values for s and ¢ will be discussed later). We have, 
therefore: 


y=xtz=2+2= 247+ A’+ad)+4+ yz, 


with = designating a sum over all pairs of factors influencing the 
trait under consideration (for a monofactorial trait, 2 covers only 
one term). Since the population under study is assumed to be 
stationary, the different terms of 2 are, as we have seen, independent 
random variables; z is also assumed to be independent. 

To simplify matters, we shall assume, henceforth, that each of 
these random variables is given its mean value in the population 
(or in a specified subgroup of this population) as an origin. This 
assumption is not restrictive as long as we agree to stipulate that, 
in measuring the characteristic, we take its expectation as equal to 0, 
which is approximately the same as subtracting the general mean in 
the population (or in the subgroup) if it contains a large number of 
individuals. If we symbolize the expectation of a random variable 
by 97, the stipulation which we made will be expressed by 1U(5C) = 0, 
M(x) = 0, Mz) = 0, and 9N(y) = O, and, by selecting appropriate 
values of s and ¢t, I%(A) = 0, 9N(d) = 0. 

The variance of trait y (i.e., the square of its standard deviation) 
in the population (or in the subgroup) because of independence will 
be: 


Mey?) = M(x?) + Mz?) = ZIUSC?) + Mz’), 
which we write as 
o=at+a, 


o being the standard deviation. 

All these formulas are also valid for multiallelism. 

The fact that a population is stationary imposes the condition 
that variance remain constant over generations. Thus, the fact that 
variance is conserved, as we know by experience, may be considered 
confirmation of the Mendelian hypothesis of the inheritance of char- 
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acteristics. The theories of “‘blending inheritance”’ that certain bio- 
metricians tended to accept would imply that, if the hereditary 
portion x of a trait in the two parents was equal to x; and Xx, it 
would be equal to (x; + x,)/2 in the offspring, the remainder of 
variance being attributed to chance and to the environment. Given 
panmixia, and assuming that 91%(x) = 0, the variance of x in the 
entire progeny of a population, taking I(x») = 0, would be 
MMU (x%1 + X2)/2]? = M(x?)/2, i.e., half the variability in the parents; 
thus, the genetic variance of x would tend rapidly toward zero after 
several generations. Finally, the only variance left would be caused 
either by chance and environment (but the experiments of Johannsen 
[9] on pure lines have shown that such variance is small for most 
traits) or by mutations, which would then have to be very frequent 
(but this conclusion contradicts our experience). ‘Blending inher- 
itance”’ is thus inadmissible, and the Mendelian scheme, with indef- 
inite disjunction of parental traits, is one of the simplest of those that 
have the conservation of hereditary variance as a consequence [4]. 

Let us now show how the Mendelian scheme leads to the same 
results as biometry. Assume, henceforth, that the trait studied is 
multifactorial and depends on a large number of genes, each making 
a contribution of the same order of magnitude. Therefore, x is the 
sum of a great many independent random variables, each of which 
is small in relation to the standard deviation, c,, of x; according to 
Liapounov’s theorem, the probability of x follows Gauss’s law, 
(1/V 2mo;z) exp (—x?/20.2) dx. If the effects of chance and environ- 
ment on development come from multiple and independent sources, 
z and y will also be almost Gaussian, which result agrees with the 
observations of Galton and Pearson on stature. 

Let us measure the trait y for two related individuals, J; and h, 
and let y, and y2 be the two respective values. It can be shown that 
the probability of the sum of the two random variables y, and y» 
follows closely Gauss’s law and, therefore, can be expressed by its 
coefficient of correlation. The experimental determination of this 
coefficient for a large number of pairs of individuals with the same 
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ancestry in a large population was made for different populations 
by Galton [5], Pearson [17, 18], and Snow [20]. We shall now con- 
sider its theoretical value. 


2.4 CORRELATIONS BETWEEN RELATIVES 
WITHOUT DOMINANCE 


Without dominance, we have the relationships 
y= WR+72= (H+ A')4+2z 


and 
M(H + A’) = 29(A) = 2(pt + qs). 


Therefore, the convention 1(5C) = 0 is equivalent to 9N(H) = 0, 
that is, pt + gs = 0. 

Let y, and y, be the random variables representing the traits of 
two individuals, 4; and J, with coefficient of coancestry f. Without 
dominance, 

yw = 2 + At) + 1 


and 
Voy = >(A2 = 2) + Zo 


To find their coefficient of correlation, r, let us calculate the mean 
value of their product, which is reduced to 


ee oe Knee Rael. itn 2a, 


M(piye) = DINAH, + Ai A, + Hi A2 + Hi A2), 


since, because of independence, IN(z,;A2) = IN(z,)9N( He) = 0, and so 
on; 9N(z,z2) = 0; and, if K and K’ represent the genic random 
variables for any other pair, SW(Ki He) = I(Ki)IN(A2) = 0, and so on. 

Furthermore, each term, such as I%(HiA2), is calculated on the 
basis that the random variables H, and A, reflect the state of two 
homologous loci taken at random on /; and /:; ie., they have a 
probability f of being identical and 1 — f of being independent. 
From this, 

M(AHe) = fn H72). 
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Thus, f is the correlation coefficient of H; and H2. Therefore, 
M(yiyr) = 4f TInN(Hi). 


On the other hand, if ff and f, are the inbreeding coefficients of 
I, and J,, we have 


Myvi) = ZMH + Hr)? + Mz?) = 21 + fiyZIN(Hi) + Iz’), 


since IN(MiH1) = fi(A7). Therefore, the correlation coefficient 
sought is 


r= m(ny)/VnQm®) = 4/VA 4+ i+ MA + AFP, 


calling £ the ratio 9(z?)/2>9N(H?). 

For a trait determined by heredity only and for unrelated individ- 
uals, r reduces to rp = 2f, which will be called the fundamental 
correlation. This gives the familiar coefficients: 1/2 for parent and 
offspring and for full sibs; 1/4 for half-sibs, or for grandfather and 
grandson, or for uncle and nephew, or for double first cousins; 
1/8 for first cousins; and so on. But any coancestry between the 
individuals compared, and any effects of environment upon them, 
will make fi, f2, and &? unequal to zero, and will reduce the funda- 
mental correlation. 


2.5 CORRELATIONS BETWEEN UNRELATED 
INDIVIDUALS WITH DOMINANCE 


Given that the probabilities of genes A and a are p and q, the prob- 
abilities of the three genotypes in the population will be p?, 2pq, 
and q’. Let us still consider that the random variables 3¢C have 
origins such that 9N(5C?) = p?i + 2pqj + q?k = 0; d takes the values 
i— 2t,j —s —t, and k — 2s, with probabilities p*, 2pq, q?; t and s 
are the values that each of the random variables H and H’ may take 
(values which, so far, are arbitrary). Along with Fisher [3], let us 
choose values which minimize 91(d?); we obtain 


pG — 2) + qG-—s—t=0, 


(2.5.1) 
DG = SD) gtk = 2s) = 0, 
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by setting the partial derivatives with respect to ¢ and s equal to zero. 
We thus obtain the fixed values, ¢ = pi + qj, s = pj + qk, which 
satisfy the equation 9(H) = 0 (because pt + gs = 0). Therefore, 


sd) = sU(5C) — M(H) — M(H’) = 0. 


Furthermore, equations (2.1) indicate that the mean value of d 
is zero when the value of H (or of H’) is fixed. If we set H equal 
to t, H’ (which is independent of H because the individuals are not 
related) will take the values t or s with probabilities p or g, and d will 
take the values of i — 2t¢ or j — s — t, whose mean value is equal 
to zero in accordance with (2.5.1). It follows that m(dH) = 
y(dH’) = 0. Thus 


MUS”) = MC + HA’)? + snd?) = 290(H?) + ma’), 
My") = ZMH?) + Zora?) + MM(z’). 


Let us take for two related individuals, 4, and J, the values 
‘v= (A, + Hy + d;) +2, and ya= D(A, + Hz + do) + 22. By 
hypothesis, H; and Hj are independent, as are H, and Hz. M,, there- 
fore, cannot be positively correlated at the same time with both 
H, and H3. Let us suppose, for example, that H; is correlated with 
H, only; in that case Hj can only be correlated with H2; let us 
designate the respective correlation coefficients by ¢ and ¢’. We have 


(yipye) = LLIN AL Ae) + MCAT AD) + M(did2) + Md Ae) 
+ 9(dH2) + I(d2Ai) + M(d2H7)], (2.5.2) 


because terms such as 91(Z,H2), IN(Zid2), or M(z1Z2) are equal to zero, 
since the random variables they include are independent and have 
mean values equal to zero. 

Furthermore, the last four terms of (2.5.2) are also equal to zero. 
If, for example, the value of Hy is fixed, d,; depends only on 3; 
therefore, it is independent of Hi, and 9N(Aid2) = M(H) M(d2), but 
we know 91(d,) = 0. Therefore, the mean value of Hid,, being equal 
to zero when H, is fixed, is also equal to zero for any value of Hp. 
The same is true for the other three terms. Thus 
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M(piy2) = Z[MCALAe) + MCAT As) + M(did2)] 
= (¢ + p)zZIM(A*) + ZM(diad2), 
and everything goes back to the computation of S1(did2). 


2.5.1 The Two Individuals Are Related 
by Only One of Their Loci 


In this situation, only two of the four genic random variables are 
not independent, e.g., H, and A; let @ be their correlation coeffi- 
cient, which depends on their degree of relationship. If we fix Mh, 
then d; depends only on Hj and becomes independent of H2, Hz, 
and d>, and its mean value is equal to zero. Then 9N(did.) = 0 when 
H, is fixed; therefore, for any value of Mi, I(did.) = 0. Thus, 
M(piye) = VIANA) = 2H’). From this, the correlation coeffi- 
cient of y; and y is 


_ Myr) _ ¢ 

omy) AL +n? + &) 
where 7? = DIN(d?)/2Z9N( HA”), and & = 9N(z?)/29N(H”). This can also 
be written as r = [(¢/2)r7]/o?, where 7? is the “genic additive 
variance,” 229U(H?), and o? the “total variance,” 229(H?) + 
Dyu(d?) + M(z?). To avoid having to evaluate ¢, we note that, 
since @ does not depend on dominance, this formula can be written 
as r = ror?/o*, ro being the “fundamental correlation” previously 
defined. Therefore, for individuals related by one locus only, 
dominance plays exactly the same role as chance and the environ- 
ment in reducing all the ‘fundamental correlations” by a fixed ratio 
which is less than unity. This formula, in particular, gives, for simple 
correlations in direct or collateral line of descent, r = (1/2)"7?/o?, 
with n = 1 for parent-offspring correlation, n = 2 for grandparent- 
grandson, half-sibs, or uncle-nephew, n = 3 for first cousins, and 
so on. It does not apply to full sibs or to double first cousins, who 
are related by two loci at the same time. We shall show (See §2.5.2) 
that in these cases, the reduction of the fundamental correlation is 
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less important because there is a positive correlation, IN(d,d2) > 0, 
between the residual dominance, d, and d2, of the two individuals. 

Finally, if we wish to find the partial correlation between trait y 
in an individual, 4, and in one of his ancestors, /, assuming the 
value of that trait as fixed in an intermediate ancestor, J3, who is 
separated from them by n and p links, respectively, we can apply the 
following classical formula if the regressions are linear (as they are 
when the random variables y are Gaussian and in Gaussian relation): 


Tyo — Pi3ho8 


_ (1/2)rt?1?/0? — (1/2)(r?/0? 1 /2)9(7?/0?), 
40s Slee) 


This coefficient, in general positive, is not zero except if 77/0? = 1, 
that is, if there is neither dominance nor influence of the environ- 
ment; it is only in this case that, if we know the trait in an ancestor 
of 4, similar information from previous ancestors in the same line 
of descent would not give us any more information about J; (no 
“ancestral inheritance’). But there is almost always dominance or 
influence of the environment, and because of this, knowledge about 
a trait in an ancestor allows a positive correlation among earlier 
ancestors and the descendants. This “law of ancestral inheritance,” 
shown experimentally by Galton and Pearson, is then not at all in 
contradiction (as Bateson and Weldon believed) to the laws of 
Mendel. From Mendel’s laws it follows, indeed, that in making 
predictions about offspring, knowledge of the genetic constitution 
of one ancestor makes all knowledge about earlier ancestors un- 
important. Our study, however, simply shows that knowledge of 
trait y in a given ancestor, when there is dominance or environmental 
effects, provides insufficient information about its genetic constitu- 
tion, and more precise information can be derived from knowledge 
about earlier ancestors. 
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2.5.2 Individuals Are Related 
by Two of Their Loci 


Let 30, = Ai + Ai + d; and &, = A.+ Az + ad. Let us calcu- 
late SU(5Ci5C2), knowing that H, and A have a correlation coeffi- 
cient ¢, that Hj and H2 have a coefficient ¢, and that these two sets 
of random variables are independent of each other. 

The generating function V(x, y, u,v) of all these four random 
variables taken together is, therefore, the product of the generating 
functions V;(x, y) and V.(u, v) of the two sets H; and H2, Hi and H3. 

Let us recall that the generating function of random variables, 
taking the respective values a, 8, and so on, is, by definition, the 
expectation of x*y®.. . (instead of the characteristic function which 
is the expectation of e**e% . . .). Therefore, 


Vi(x, y) = p(p + oq)x'y' + pq — o)(x'ys + xy’) + aq + op)xty 
= (px' + qx*)(py' + gy’) + pq(x' — x*\(y' — y’), 


and V.(u, v) may be expressed in terms of y by replacing x with u, 
y with v, and ¢ with ¢’. We know that the generating function of the 
two variables taken together, Hi + Hi and H,-+ Hz, may be ob- 
tained by setting x = u and y = » in the product ViV2; it is then 
W(x, y) = Vilx, y) Vox, y) = ZPapx*y', the coefficient Pog of x*y8 in 
W(x, y) representing, by definition, the probability of also having 
H,+ Ai = aand A, + H2 = 8, and, therefore, of 3C; and 3, having 
determined values f(a) and f(@). Knowing W enables us to calculate 
MMUICi5ICo) = VPagf(a)f(B) by replacing (in W) x* by f(a) and y® by 
f(@), i.¢e., x? and y*' by 7, and so on. Let us calculate, then, 


W(x, y) = (px! + qx*)(py' + gy’)? 
= pale ae px’ Fax ix’ — x py - gy) — ») 
ip ao er <x) — yr")? 


by replacing x?! and y”! by i, x‘t* and y'** by j, x?* and y”* by k, we 
obtain, 
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MGC) = (p*i + 2pgj + g’k)? + pq(e + ¢’\pi + | — p)j — gk)? 
+ p’g?’g'(k — 2j + i)’. 
This is a symmetric bilinear form of ¢ and ¢’, in which the coefficients 
are well-determined in a given population and are independent from 
¢@ and ¢’. In the same way, 
r= M(piye)/oyioy, = M(x1%2)/My?) = DINFCs5C2)/IM(y?) 
= pq + $) pi +  — p)j — ak}? + p’q?de'(k — 2j + i)?}/o?. 
Let us calculate the coefficients by giving ¢ and ¢’ specific values. 


We have seen that, for ¢’ = 0, r is reduced to (¢/2)r?/o”. We can 
write, therefore, 


r= (G+ 4/201] + o6'e/o", 
where o? = IM(y?) = D[29N(A”) + 9N(d?)] + 9N(z?), the total vari- 
ance; 7? = 229I(H?), the genic-additive variance; and e? = D9N(d?), 
the dominance variance. If we set @¢ = ¢’ = 1, 3; and 3, become 
identical, and so we have, 
Toe | BOC) Sa) mma?) | 
8h TAAL 0° 


These calculations, incidentally, lead to, 


7 = Z2pq[pi + (q — p)j — ak}? 
and 
e? = Yp’qrk — 2j + i). 


Comparing this with the formula IN(jyiy2) = (6 + ¢’)ZIN( A?) + 
DIN(did>), we note that M(did2) = ode? = od’IN(d?); the correlation 
coefficient between the dominance components d; and d; of i, and J2 
is therefore ¢¢’, which is the product of the correlation coefficients 
between the genic random variables. It is zero if J; and J; are related 
by only one locus, but positive if J; and J, are related by two of their 
loci, and this results in an increase of their correlation. For example, 
for brothers, ¢ = 1/2, ¢’ = 1/2, andr = [(1/2)r? + (e/2)]/o?. This 
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correlation is, therefore, higher than that between parent and off- 
spring when there is dominance; another reason for this higher 
correlation is that the effects of environment on two brothers cannot 
be regarded as independent if they are brought up together. For 
double cousins, ¢ = ¢’ = 1/4, and r = [(1/4)r? + (e€/4)]/o?; this 
correlation is higher than that between uncle and nephew. 

The phenomenon of dominance is, thus, statistically expressed by 
correlation coefficients which are higher for the double relationships 
than for the corresponding simple relationships. This higher correla- 
tion decreases rapidly, however, as the relationship becomes more 
distant, because the product ¢¢’ rapidly becomes negligible. 


2.5.3 Various Extensions 


(1) The results are valid if there is multiallelism, because Vi(x, y) 
is still a linear form of ¢, therefore, W(x, y). I1(5Ci5C2) and r are 
bilinear symmetrical forms of ¢ and ¢’ whose coefficients are deter- 
mined by setting ¢’ = 0, and then ¢ = @’ = 1. 

(2) The results could be extended to the case where the effects of 
the different pairs of genes on the traits considered are not additive 
(generalization of dominance) [3, 11]. 

(3) The calculations could be modified to take into consideration 
the resemblance between parents (homogamy); the effect of doing 
so is to increase all the correlations [3, 11, 22, 23, 24]. 

(4) If we separate the sexes in the statistical measurements of the 
correlation, we find, in general, different results for each sex, because 
of the contribution of sex-linked genes to the trait considered, and 
the same calculation as in §2.4 could be applied [7, 8]. 


2.6 CORRELATIONS BETWEEN ANY 
INDIVIDUALS WITH DOMINANCE 


For two individuals, 4, and J2, with a coefficient of inbreeding not 
equal to zero, the calculation of correlations is much less simple 
when there is dominance, because the four random variables, Hi, Ho, 
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H;, H2, will be related among themselves. It then becomes indis- 
pensable to determine the generating function of all four random 
variables, which, for a given type of relationship, can only be done 
step by step by the following method: given a group of individuals, 


Th, Io, ..~5 In, let us designate by Poags... the joint probability 
that their 2n homologous loci are in the states represented by a, a’, 
B, B’, ... (each one of these quantities having one of the values 


t or s). The generating function for the 2n loci will then be: 
o(a, a2, hi, bo, nae ) = DP ax’ Bp’ 4 ac aas bby ss) oy 


with the following properties. 

If we bring together two groups of individuals with no correla- 
tion, the functions ¢ are multiplied. 

If we disregard one of the individuals, for example, h, the gen- 
erating function for the remaining individuals can be deduced from 
¢ by setting a) = a = 1. 

If we add to the group an offspring, E, from a couple of the group, 
for example, an offspring of 4, and /, the gencrating function of the 
group thus increased will include two more variables, related to E, 
say, 4, and ,; according to Mendel’s laws the generating function 
will then be 
+live + 

2 2 

1/4[ b(aih, a2, bil, be) + (ah, ao, b1, bel2) 
+ (a1, ah, bilo, b2) + O(a, Ah, by, belr)]. 
We can then proceed gradually from the probabilities of a given 
initial group to the probabilities of any group which was derived 
from it by given matings. The calculations, however, are rarely 
simple. 


DP xa’ Bp’ Aue aiax oc. — 


Chapter 4 


Evolution of a 
Mendelian Population 


We have discussed, thus far, only a stationary Mendelian popu- 
lation, i.e., a population in which the frequency of any given genes 
does not change from one generation to the next, a circumstance 
that can occur only if the population is very large and if the different 
alleles do not give their carriers either an advantage or a disadvantage 
in the struggle for existence (1.e., all neutral genes). We shall now 
consider, first, a population of limited size, and later, a population 
in which there is selection of genes. We shall see that in such popu- 
lations the frequency of genes does not remain constant but changes 
in the course of time. We shall then have to answer two questions: 
Where does this evolution lead? At what rate does it take place? 


3.1 INFLUENCE OF POPULATION. SIZE 
ON NEUTRAL GENES 


3.1.1 Constant Population Size . 

Let us examine a population made up of a constant number of 
individuals, K, reproducing by random mating, and consider, first, 
genes whose mutation rate is negligible. Starting with an initial 
generation, Fo, we designate the successive generations, which we 
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shall assume to be nonoverlapping, by Fi, F:, and so on; if genera- 
tions overlap, and mating between different generations is possible, 
computations become more complicated, but the results are not 
essentially modified. In spite of random mating, the individuals of 
the nth generation, F,, will certainly present some consanguinity if n 
is sufficiently large, because each one will have at the most K distinct 
ancestors of order n, rather than the theoretical 2” ancestors. We 
could calculate the coefficient of coancestry of an individual only if 
we knew all the chains of relationship connecting his two parents, 
i.e., if we had complete records of mating since the beginning. We 
shall see, however, that one can characterize a priori the average 
coancestry of the nth generation by a number f,. By definition, f, 
will be the probability, evaluated a priori, that the two homologous 
loci of an individual taken at random in F, are identical, i.e., they 
come from the same locus of a common ancestor. In each experiment 
conducted, the a posteriori coefficient of coancestry will depend on 
the individual considered, but given a large number of individuals, 
ff, will approximate the mean value. 

Since the genes being considered are neutral, the a priori proba- 
bilities of the different alleles will be the same for all generations, 
that is, p and gq, if we assume two alleles only. The formulas 
Pip + fr9g); 2pq — fr), and qq +fnrp) will represent the a priori 
probabilities of the three genotypes for the nth generation, and also 
their frequencies, given enough experiments, in which we would 
always start with the same frequencies, p and q, for genes A and a. 
We shall calculate f,, in different cases, disregarding mutations.* 


A, Dioecious Individuals. Consider first an animal population 
with separate sexes, made up of constant numbers N; of males and 
N2 of females, forming the subpopulations ,F of males and 2F of 


* The above formulas are based on the assumption that there is only random 
inbreeding (consistent with panmixia). If there is also systematic inbreeding 
(see p. 53), the formulas may be modified. 
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females. Since there is panmixia, the two homologous loci of an 
individual, J,, of F,, are taken at random, one from ,F,,-1, the other 
from »F,1. The probability that they come from the same individual 
of 1F,-2 or Of oF,» is 

Vetet pede. 4 1 

2N: 2° 3N,2" N 
(by designating N as the harmonic mean of 2N; and 2N2, 1/N = 
1/(4N;) + 1/(4N2). The complementary probability, 1 — (1/N), is 
the probability that they come from different individuals of F,,_». In 
these two cases, the probabilities that the two loci are identical are 
(1 + fr—2)/2 and f,-1 (because f,-1 represents the probability that 
two homologous loci taken from two different individuals of F,,-» 
are identical). Therefore, f,, the probability that the two loci of J, 
are identical, is given by 


E85 1 + fr—2 l 
fy = EE + (1 Ff) fo 
From this linear recurrence we can easily deduce f,. First of all, we 
return to a homogeneous recurrence by noting that the equation is 


verified for f, = constant = 1 and by letting a, = 1 —/f,, from 
which 


An = [1 = (1/N) Jon—1 + (1/2N) an». 


Here a, will be a linear combination of two solutions of the form k”, 
k being given by the characteristic equation 


k? — [1 — (1/N)]k — 1/2N = 0. 
Thus, 
o, = M(1 — 1/N+ V1 + 1/N%)/2)” 
+ pl — 1/N — V1 + 1/N%)/2)", 


\ and uw being determined by the two initial values of ao and ay, 
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a =A+T4H, a = ao[(1 — 1N)/2] + (A — w)V1 4+ 1/N?/2. 


Let us note that the indefinite brother-sister mating, studied in 
detail by Haldane and by Fisher, becomes a special case in this 
formula if we let N = 2: 


om = ML + V5)/4)" + wll — V5)/4)°. 


If N is large, we have 


NV 1 + 1/N? = [(V1 + 1/N? — 1 + 1/N)ao/2] + cn, 
r Se 4 i + ao/2N. 


Therefore \ has for its principal part a, if a, ~ 0, that is, if the 
initial population is not formed by identical homozygotes. The term 
in u becomes rapidly negligible with respect to the term in \, because 
their ratio is equivalent to (—1/2N)"u/d. We have, then, as n starts 
to increase, 


an = l —frra(l — 1/2N)” ~ aye—"!2%, 


and we are led to the following important conclusion: f, tends toward 
1 when x tends toward infinity. Thus, we tend asymptotically toward 
a population in which the two homologous loci of each individual 
would have the probability 1 of being identical, and, therefore, a 
population in which all the loci would be identical, made up of 
identical homozygotes. For neutral genes and with no mutations, 
indefinite panmixia in a limited population always leads to complete 
homogeneity. This result, surprising at first, stems from the fact that 
a gene can be eliminated when the random drawing of the 2/N loci 
of the following generation happens to always favor the same one of 
the two alleles; on the other hand, a gene, once eliminated, never 
reappears. The a priori probabilities of genes A and a in the F, 
generation are certainly always constant and equal to p and q, but 
this now means that the final population has the probability p of 
containing only AA’s and the probability q of containing only aa’s. 
Here is a large difference from the case of an unlimited population, 
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in which the three genotypes coexist indefinitely with frequencies p?, 
2pq, q’. But it must be noted that the asymptotic homogeneity is 
reached extremely slowly if N is large; for a, = 1 — f, to be reduced 
to one-tenth of its value, a number, n, of generations is required such 
that exp (—n/2N) = 0.1, and therefore n = 2N In 10; to appreciably 
reduce the difference 1 — f; which measures the deviation from 
homogeneity, requires about as many generations as there are indi- 
viduals in the population. These results have important biological 
consequences; several biologists have insisted on the role of chance 
in the elimination of neutral genes. We observe, in fact, in many 
animal and plant species, the divergence of “‘geographical races,” 
which, after having been separated by a barrier, such as a body of 
water or a range of mountains, evolve toward different homozygous 
states, one having finally only genotypes AA, the other having only 
aa. That divergence could certainly be explained by disruptive selec- 
tion depending on the geographic situation, the gene A being advan- 
tageous in one location, the gene a in another. As frequently happens 
for neutral genes, however, it must be admitted that this evolution 
results from a small population becoming homogeneous; this ho- 
mogenization arises from random elimination, which in its course 
eliminates sometimes one of the two genes, sometimes the other. 
This explanation has at times been used improperly; it must be 
emphasized that random elimination cannot take place in such a 
short period of time unless the population is very small. Consider 
the blood group of the American Indians. All these Indians seem to 
come from the same ancestors, in spite of their morphological varia- 
bility and limited intermarriage with immigrants from Oceania and 
Melanesia. They are the only race in the world to have exclusively 
only one blood group, the group O (OO). 

The blood groups A and B, however, result from extremely old 
mutations, since they probably existed before the separation of the 
lines of chimpanzees and men, and must have always existed in far 
eastern Asia, where the great human migrations probably originated. 
The different blood groups seem to be without selective value, be- 


36 Evolution of a Mendelian Population 


cause they coexist in Asia and in Europe under all climates and at all 
latitudes. It seems, then, that Indians derive from a group of Asiatic 
immigrants in which the genes A and B disappeared by random 
elimination. The group must have developed rapidly, however, after 
its arrival in the new world; it could not have remained small for 
more than a few generations, after which the change in gene fre- 
quencies had to be very slow. To reach homozygosity within a few 
generations, the group would have consisted of only a very few 
individuals. The hypothesis of random elimination of genes A and B 
in America thus leads us to consider that most of the American 
Indians derive genetically from a small number of Asiatics 
(Mongoloids), who came to America perhaps by crossing over the 
Bering straits, and to confirm the thesis of American ethnographers, 
but not the thesis that the American Indian race resulted from 
hybridization among Mongoloids, Australians, and Melanesians who 
came at different times by sea. Immigration from Melanesia has had 
obvious influence only on very isolated regions, such as the Siriono 
area (the virgin forest of Amazonia). Our hypothesis is further cor- 
roborated by the observation that, although the NN and MN blood 
groups of the MM-MN-NN series occur quite frequently among all 
races, they hardly ever occur among Indians. Certainly, there are 
many other genes for which the American Indian population includes 
heterozygotes, but these genes may have originated, for the most part, 
from mutations which occurred after the occupation of America. 
The slowness of random elimination of genes in a population which 
numbers even a few hundred individuals is confirmed by the example 
of the Gypsies, nomads who came from India into Europe more than 
a thousand years ago, and who have conserved remarkably the 
Hindu type, because they marry almost exclusively among them- 
selves. The few thousand individuals that these isolated populations 
number in Germany and in France have conserved the same fre- 
quency of blood groups as the Hindus, in spite of the thousand-year 
separation, being 40 per cent B, the highest proportion in the world. 
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B. Monoecious Individuals, Consider now a plant population of 
N monoecious individuals, in which both sexes occur on the same 
plant. Self-fertilization is now possible, but suppose that it is no more 
or less probable than cross-fertilization. The two homologous loci 
of an individual J, of F, then have the probability 1/N of coming 
from the same individual of F,,;, in which case their conditional 
probability of being identical is (1 + f,-1)/2, and have the proba- 
bility 1 — 1/N of coming from different individuals, in which case 
their conditional probability of being identical we denote by ¢,; then, 


We have defined ¢, as the probability that two homologous loci 
taken from two different individuals of F,,; are identical. Because 
of panmixia, ¢, = fn-1; therefore 


Tn = 1/2N + (1 Pe 1/2N) fia, 


from which 
m= 1—f, = (1 — 1/2N)on_1 = a1 — 1/2N)”. 


Thus a, still tends toward zero, and f, toward 1. The classical case 
of indefinite self-fertilization is obtained for N = 1; 1 —/f, then 
decreases by half in each generation, and almost complete homozy- 
gosity is reached quite rapidly. Repeated self-fertilization of a plant 
species is a rapid procedure for obtaining a line homozygous for 
almost all factors. But if N is large, homogeneity is established very 
slowly; a, is then of the order of exp (—n/2N), as with dioecious 
individuals. The occurrence of both sexes on the same plant modifies 
the evolution of the population to an insignificant extent, provided 
self-fertilization is not favored more than cross-fertilization, since 
we have already shown that exclusive self-fertilization rapidly leads 
to homogeneity. 
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3.1.2 Population Size Not Constant 


Suppose now that population size is not constant but varies over 
the course of time. Consider the case in which the sexes are separate. 
The numbers AN; and N, will be functions of the order, i, of the 
generation F;. Let us set 1/4N, + 1/4N2 = 1/N(i). 

The formula 


fn = [Cl + fn—2)/2N] + Cl — 1/N) fr 


holds, providing we substitute for N the value N(n — 2). From this, 
by increasing the indices by two to simplify the formula, we deduce 


anyo = [1 — 1/N(n) angi + an/2N(n). 


We have this time a linear homogeneous recurrence with variable 
coefficients. We shall solve it by setting a, = koki...k,, the k;s 
being constants to be determined, which are related as follows: 


Kniokav = [1 — 1/NM(n)]knys + 1/2N(n); 
that is, 
Kny2 = (1 — 1/N@)] + 1/2N@ky), 


which enables us to calculate gradually the k;s, starting with k,; > 0. 
Then Kaya > 03; form 2-0) kee ><) —1/N@); -and, for n.2 1, 
Kn42 <1, because kny2 < 1 is equivalent to kny1 > 1/2. Therefore 
the values taken by a, are positive and decreasing, and, when n tends 
toward infinity, tend toward a limit, a > 0; and f therefore tends 
toward the limit (1 — a) < 1. 

To make lim f = (1 — a) < 1, that is, for the heterozygotes never 
to be completely eliminated, it is necessary and sufficient that the 
series log a = log ky) + logki+...+ logk, +... converge. For 
this k, must tend toward 1, which necessitates, by the recurrence 
formula, that both N(n) and n be infinite; this last condition suffices 
for k,, —> 1, because, by letting k,42 = 1 — unto, where 0 < Unyo < 
1/N(n), the recurrence may be written 
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Unye = [1/2N(@)][1 — 2un41)/(1 — Unys)] < 1/2M(n). 


Therefore un 2 —> 0 if N(n) — ~. It follows that: 

(1) If M(m) remains finite when n — oo, limk, < 1, and 
lim log k, < 0; therefore loga = —«~, a = 0, and f tends to- 
ward |. 

(2) If N(n) tends toward infinity along with n, k, = 1 — u, —> 1, 
and log k, = log (1 — u,) —> 0. To study the series with general 
terms log k,, let us note that 


Une = (1 — €n)/2N(n), 
where 
En = Unya/(L + Unis) — 0. 
Therefore 
Unie ~ 1/2N(n), 


and the series log k,42 = log (1 — un42) converges if the series un+2 
converges. 

(3) If N(n) increases at the most as a linear function of n, the 
series diverges, and f —- 1. 

(4) If N(n) increases at least as n'** (k > 0), the series converges, 
lim f < 1, and there is no complete disappearance of heterozygotes. 

The same results would obtain for monoecious plants. 


3.1.3 The Role of Mutations 


It is obvious that the genetic heterogeneity of a population, i.e., 
the presence of numerous heterozygotes, does not usually result 
from the fact that the population is extremely large, but from new 
genes appearing from time to time; either by mutation or by migra- 
tion of individuals from a different population. Let uw; be the mean 
frequency of mutation per generation for a specified locus, and 
u, the mean frequency of migrants per generation, these migrants 
coming from a population large enough that we can assume there 
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is no relationship among them. Let us set u = 14 + wu. The proba- 
bility that a locus, A,, comes from a nonmutated locus of an “‘in- 
digenous”’ individual (i.e., a nonmigrant) of the preceding generation 


is | — uw4— w= 1—-—u. As a result, in the case of dioecious 
individuals, 


fn = A — ul + fr2)/2N + CL — 1/N) onal, 


¢n—1 Tepresenting the probability that two loci taken from two differ- 
ent indigenous individuals of F,-» are identical. But the coefficient 
of coancestry, f,1, of an individual of F,,_; is evidently (1 — u)’o,-1. 
From this we deduce 


fxr = (1. — ol + fr-2)/2N + (1 — wl — 1/N) fe. 


Since we can assume w? to be negligible, the equilibrium value 
of fi, is 
1 — 4u l 


(b= a P/N = Neg ~ tan 


To see how f tends toward this limit, let us set a, = f— fi. We 
have 


an = (1 — 4u)an2/2N + (1 — 2u)(1 — 1/N)an-t. 
This equation has two solutions, a, = k,, k being a root of 
k? — (1 — 2u)(1 — 1/M)k — (1 — 4u)/2N = 0. 
Therefore 
2k = (1 — 2uyl — 1/N) + V(l — 2nd — 1/N)? + AI = 4u)/N. 
The greater of the two roots is given by: 
2k = 1—2u—1/N+ (1 — 4u — 2/N + 8u/N + 2/N — 8u/N)'”? 
+ Ou) + O(1/A"); 
k = 1 — 2u — 1/2N + Ou?) + O(1/N?). 
Therefore 


On =f—hfr ~ (1 —s Qu —- 1/2N)" Pay e72nu—n/2N a e— ANutl)n/2N | 
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The equilibrium value is reached more rapidly when there is muta- 
tion or migration and much more rapidly if 4Nu is large. In monoe- 
cious plants, we find again the same results. 

In summary, we note that the coefficient of coancestry f,, tends 
always toward a finite limit f. If it is equal to 1, the population will 
almost certainly become genetically homozygous, given sufficient 
time. If it is different from 1, usually some heterozygotes will persist 
in the final population, the a priori probability that an individual 
taken at random from this population is heterozygous being 
2pq(1 — f). If the population size cannot be taken as increasing 
indefinitely, f = 1/(1 + 4Nu) is considerably less than 1, provided 
that 4Nu is not small, or that the frequency u of mutation and 
migration per generation is on the order of 1/N, at least, or that the 
total number, 2Nu, of new genes introduced in each generation by 
mutation or migration is one or more. Whatever the population 
might be, if mutation or migration affect some individuals in each 
generation, a considerable number of heterozygotes may persist. 


3.2 INFLDENCE OF SELECTION 


Let us study the distribution, over the course of time, of a pair of 
factors having only two alleles, A and a. 

Let us designate by p and q, p = 1 — q, the probabilities of A 
and a in the adult breeding individuals of the generation F,. The 
probabilities p + dp and gq + 6q (6g = —6p) in the following genera- 
tion, F,,41, will be, in general, different from p and g. The change 6q 
results from several causes, each producing a small change (such 
that their squares and products would be negligible). 


A. Mutation. Because of recurrent and reversible mutations, 
there is, in each generation, in the reproductive cells of F,, a mean 
proportion, m4, of a genes transformed to A, and another propor- 
tion, 21, of A genes transformed to a, which gives g — mg + v(1 —.q) 
as the average frequency of a in the reproductive cells; the change 
of gq produced by mutation is a linear function of q. 
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B. Migration. We must take migration into account as soon as 
we consider a local population instead of all the individuals of a 
species, since a local population is almost never completely isolated; 
it always exchanges individuals with the neighboring populations. 
It follows that, if what we called F,, designates all the indigenous 
individuals born at a certain place, the breeding population will 
differ from F,,; it will be formed, on the average, of only a fraction, 
1 —k, of F, individuals, the remaining fraction, k, consisting of 
migrant individuals. If we assume that these individuals come from 
a group of populations whose composition can be considered con- 
stant over the course of time and characterized by a frequency, qm, 
of the gene a, the mean frequency of a in the breeding population 
will be 


(1 — k)g + kam = 9 + K(Qm — 9). 


The change in q caused by migration is, therefore, a linear func- 
tion of g, as with mutations. It can be written in the same form, 
—tug + v({1 — qg), by setting kg, =v. and k = w+ w, that is, 
us = k(1 — gm). The change produced jointly by mutation and 
migration is then 


dq = —mg + u(1 — gq) — mg + v1 — gq) = —uq + r(1 — q), 
by setting 
uU= Wy + uw = 4 + K(1 — gm) and v= m+ 2 = 1+ kam. 


This change is therefore a linear function of q. 

The inclusion of both mutation and migration effects in the same 
formula is based on a simplified, gross model which assumes that 
the migrants come from outside populations whose composition 
remains constant in time. In reality, these populations undergo 
evolution, and are themselves affected by migration. What should 
be studied, then, is the evolution of a group of populations inter- 
acting with each other by migration (see §3.3). 

For the time being we shall assume that the reproductive cells of 
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the F,, generation, because of migration, carry the gene a with the 
frequency qi = q + dg = q — ug+ (1 — q). 


C. Gametic Selection. Assume that the gametes produced by 
these reproductive cells do not carry the gene a with a probability q 
any longer, but with a different probability, q., because this gene 
presents an advantage or a disadvantage for the gametes which 
carry it (gametic selection); assume, also, that the probabilities of 
the two genes, instead of being gq, and 1 — q, are gq. = ag, and 
1 — q. = B11 — qi), @ and 6 having a constant ratio close to 1, 
designated by 1 — s. This ratio characterizes the degree of viability 
of the gametes, i.e., the intensity of ““gametic selection” (we can, if 
necessary, consider s positive, by calling a the unfavorable gene). 

Since we must have 


agi + BU — q) = 1, with a/B=1-s, 


we have 


ROS Sat Al ai h 
1/8 = 1 — sh, 
and 
a~w~l—s+ sq + Os”). 
Let us put go = agi = qi + bg. Then dg = (a — Igri = —sqi(1 — qQ). 


D. Consanguinity. Let us assume pure consanguinity, because 
of which each gamete contributing to reproduction, whatever its 
constitution may be, has the same probability of uniting with another 
gamete; the eventual consanguinity, however, will increase the prob- 
ability that the other gamete carries the same gene as the first one. 
If we call f the average coefficient of inbreeding of generation F,,1, 
the gametes that unite to form the individuals born, or “‘zygotes”’ 
of generation F,,,:, have among themselves on the average a condi- 
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tional* correlation coefficient /, and each of them carries A or a with 


the probabilities p, and q.; the three zygotes AA, Aa, aa, have then 
the probabilities 


P= pp2+ Iq), 2Q= 2pg(1—1), R= gogo + Ip). 


E. Zygotic Selection. Assume that the three zygotes do not have 
the same probability of developing and reaching the adult reproduc- 
tive stage and that the probabilities of the adults are vP, 2uQ, and yR. 
The three quantities vy, 4, and y have constant ratios close to 1: 


gf 


—-=1-g9; ¥ = 1 — ho. The two constants o and A characterize the 
Vv Vv 


degree of viability of the zygotes, or the intensity of “‘zygotic selec- 
tion.”’ The heterozygotes will be intermediate in viability between 
the two homozygotes if 0 < h < 1; they will be superior to both 
homozygotes if h < 0 (and inferior to both if h > 1) and if o > 0, 
and vice versa if o < 0. We must have 
vP + 2uQ + 7R = 1, 
pC = ORE 20 ey Fy = 1, 


from which 
y= 1/01 — oR — 2hoQ) = 1+ cR+$ 2h0Q + O60”). 
The probability g. + 6:g of a in the adult breeding individuals of 
F,,41 18 therefore 
qz + 63q = uO + YR 
v[(1 — ho)Q + (1 — o)R] 
(q@2 — hoQ — oR)/(1 — oR — 2ho Q), 


and by grouping the terms in Q and R and reducing the denominator 
to l, 


* This conditional correlation coefficient, which equals zero in case of random 
mating, is not the same as the a priori coefficient of §3.1, which was larger than 
zero. 
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63q = hopogd1 — DI(2q. — 1) — op.gAge + Ip2) + O(07) 
= oprge|(2h — 1)(1 — Dqz + Al —1— h] + O(0’), 


we can replace q: by q2 and p. by pi, which differ by just O(sc) [since 
g, = gi + O(s)]. We note then that the total change, 6.q and 63q due 
to gametic and zygotic selection, is of the form 


bg + 53g = qi(l — qi)(t + wan) 
(disregarding the second-order terms in s and a), with 
t= —s — fo — ho(l — J), and w = o(2h — 1)(1 — 1). 


If s, o and ho have the same sign, ¢ will also have the same sign; 
t will be called the coefficient of total selection. These two selections 
together produce, therefore, a change which is a third-degree func- 
tion in g, becoming equal to zero for gq = 0 and q = 1. In fact, the 
selection ceases to operate when a is eliminated (¢ = 0) or is fixed 
(q = 1). An essential difference exists between the above change and 
the change produced by mutations or migration. The latter is a 
first-degree function, and does not become equal to zero at the limits 
because it is always affected by a gene even if it is fixed or eliminated. 

The function is reduced to the second degree if w = 0, i.e., if 
o« = 0 (selection exclusively gametic), or if h = 1/2 (heterozygote 
exactly intermediate from the point of view of viability), or if \ = 1 
(population consisting of homozygotes, exclusively). 

We can replace qi by q in the formula if the second-order terms 
in u, v, s, and o are disregarded. We then obtain for the total change, 
dq = 51g + dq + 63g, in one generation, 


6q = —uq+ ol — 9) + gl — g(t + wa). 
es . Vo ee at} 
mutation selection 
and migration 


This third-degree polynomial we will call 6(qg). Its coefficients are so 
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small that their products and squares can be disregarded.* It gives 
the changes in the probability qg of a, as well as in its frequency in a 
very large population, caused by mutation, migration, and selection. 


3.2.1 The Case of a Very Large Population 


The difference between the probability and the frequency of a 
gene in a very large population is negligible. The frequency q varies 
from one generation to the next by a quantity 6(qg), supposed to be 
small; g is a function of time, measured in generations, whose finite 
difference is the function 6(q). The integration of 6(q) is approximately 
reduced to the following quadratic form: 


qo being the initial value in the generation ¢ = 0. 

We shall proceed to obtain the limit of g when t —> + by a 
graphic discussion (asymptotic distribution of the genes). Let us 
assume that the rates of mutation and migration, uw and 2, are not 
equal to zero. We note then that the third-degree polynomial 6(q) 
is equal to v > 0 for g = 0 and to —u < 0 for gq = 1. It does not 
reduce to zero for g = 0 or for q = 1, and it allows either one or 
three intersections between 0 and 1. To be more specific, 


6q) OW (aro cles pen ees oo 
MU =D does’ ear ieee Ke 


Let us represent in a plane @, y) the straight line D, generated by 
y, = t + wg, and the curve C, generated by 


—- Ore gl, 
q q 


* We could, of course, formulate 6(q) without making these approximations, 
but the expression obtained would be unmanageable except in such special cases 
as the study of lethal genes by Teissier [21], where aa is nonviable and o = 1; 
o? would not be negligible, but the formula would nevertheless be simplified, 
because there would be only two genotypes present. 
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The curve rises from the point (0, —«) to the point (1, +), because 


pie ante usin, 
y= (=a ¢ 
It crosses the x-axis at the point q = = = (See Figure 5.) 


FIGURE 59. 


(A) Let us assume that curve C meets the straight line D in only 
one point, Q, of the abscissa, g. Since 6(q) = 0, an initial frequency, 
qo, that was equal to g would remain constant through the genera- 
tions (stationary frequency). In the general case y, — ye, therefore, 
d(q) > Oifg <qand < Oifg > q; 6(q) is, therefore, always opposite 
in sign to g — q. The difference gq — g = r decreases constantly in 
absolute value from its initial value, ro = qo — g; to see if it tends 
a which 


toward zero, and at what rate, let us study the quotient 


is a polynomial of at most the second degree, positive, and never 
equal to zero. Let us call m > O its minimum in the range of values 
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taken by gq, i.e., between q and q; if qo, and consequently gq, is suffi- 
ciently close to g, one could write essentially 6(q) = 6’(g@)(q — 9), 
and, thus, take approximately m = 6/(g). Thus, with Ar = r’ — r 
designating the change in r from one generation to the next, we have 


—Ar > m, —Alr| 
r a 


7) = lr + Alr| < A = mr: 


After n generations, |r| < (1 — m)"|ro|; therefore, r = gq — g tends 
toward zero at least as fast as (1 — m)” does. The stationary fre- 
quency q = q, considered earlier, is stable, and any other frequency 
tends asymptotically toward it, the deviation r = q — g being multi- 
plied after n generations by a quantity certainly less than (1 — m)”. 

There are two important specific cases. 

(1) In the first one there is no selection; mutations and migration 
act alone; w = t = 0; and D coincides with the x-axis. The asymp- 
totic value, g, is equal to 


> m, Alr| < —mlr\, 


v ; ; 
Te ee (a = 1/2 if u = v); 


6q) = —uq + 011 — g) = —(u+t vq — 9). 


Therefore, m = u+ 0; q — q is reduced in n generations to a quan- 
tity less than (1 — u — v)”. This reduction is not significant unless 


n is on the order of — if w and v are reduced to the rate of 


oe) 
mutation, which is extremely low (on the order of 107°), g does not 
noticeably approach the asymptotic value unless the number n of 
generations is on the order of 10°. It will be almost impossible to 
observe a population that became stationary under the action of 
mutations alone. Moreover, the irregularity in the rate of mutations, 
as well as in the rate of migration, restricts the validity of the formula, 
but in practice selection usually plays the principal role. 
(2) In the second case there is gametic selection only, with hetero- 
zygotes being exactly intermediate in viability; w = 0; D is hori- 
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zontal, with ordinate ¢ (coefficient of total selection); t < 0 if the 
gene a is selected against; and q tends toward the asymptotic value gq, 


which is lower than q, = —"_. Let us calculate q. We have 


ut v 
6q) = —1tq? + (t —u—v)q +2, 


the roots of which are 


—t+uto+t Vit —u— 2)? 4+ 40t 
—2t 


Since 6(1) < 0, therefore g, which lies between 0 and 1, is the smallest 
root; the other root, g is obtained by taking the positive value of 
the radical, and we have 


6(q) = —tq -—Q@q — 9). 
—6(q) 


Therefore we will take for m, the minimum of ie f= a the minimum 


of —t(g — q), which is the smallest of the two quantities —1(g — q) 
and —t(q — qo). 

In the specific and usual case where uw and v (reduced to the 
mutation rate without any migration) are small compared with the 
coefficient of total selection, t, the roots are given by 


faa a 4vt 
hae [te i+ os ut ae 


which is equivalent to (1/2) E + (1 + =) | therefore 7 ~ —v/t, 


q~1+0/t, and the asymptotic value g = —v/t is small. Selection 
eliminates almost completely the unfavorable gene a; its complete 
disappearance is prevented by the mutation rate, v, alone. Unless 
qo is not close to G, i.e., close to 1, m is on the order of —?, and 
would not equal u + v unless there was selection; the asymptotic 
value is, therefore, reached much more rapidly. 
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FIGURE 6. 


(B) The curve C can be met by a straight line, such as D,, in 
three points (see Figure 6), when C has two real tangents with the 
same slope, w, as D,, and when D, falls between these two tangents. 
The tangents parallel to D,, then, have their points of contact given 

u 
Pec. 
tween 0 and 1 : w is greater than the minimum, (u!/* + »!/*)8, of the 
first member between 0 and 1. If, moreover, ¢ is within the interval 
t,. . . f from the ordinates to the origin of the tangents, the equation 
6(q) = O will have three solutions between O and 1, in order of 
magnitude 4, 2, G3. Each of these values results in a stationary 
distribution that is maintained indefinitely, but if we start with a 
different value of go, Figure 6 shows that: 

(1) If go < Go, 6(q) = y1 — ye iS Opposite in sign to gq — qi; the 
difference r = gq — @q; decreases in absolute value from its initial one, 


5(q) . 


+ - = w, an equation having two real solutions, q, be- 


~ in the inter- 


ro = Go — G1; if we take m > O as the minimum Ol aes 


val qo. . . Gi, the difference r is still reduced after n wendvedous by a 
quantity less than (1 — m)", and q tends toward the asymptotic 
value q. 

(2) If qo > G, the same reasoning shows that g tends toward the 
asymptotic value g3. The intermediate root, 2, of 6(g) corresponds, 
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therefore, to an unstable stationary state, which, depending on 
whether gq is smaller or larger than q, tends toward the stable 
stationary values q; or qs. 

(C) Let us study directly any type of selection, when mutations 
and migration are negligible, i.c., uv = v = 0; this case does not come 
directly under the preceding presentation, because under these con- 
ditions, curve C degenerates. We have 


6g) = gq — ght + wa) = wal —g)q-— 2), witha = —t/w 


(a could be inside or outside the interval 0... 1). The stationary 
values areqg = 0,g = l,andg=aifO<a<l. 

(1) If a> 1 or a < 0, 6(q) has a constant sign; if, for example, 
the sign is negative, q always decreases; —6(qg)/q has a positive 
minimum m. We deduce from this that g tends toward zero faster 
than (1 — m)” does, and so gene a is eliminated whatever its initial 
frequency was (if there were a very low rate of mutation, a would 
persist with a low frequency, as happens with gametic selection). 
If 6(g) is positive, g —> 1 and gene a is fixed whatever its initial 
frequency was. 

(2) If 0 < a < 1, two cases must be distinguished: 

(a) If w > 0, 6(qg) always has the same sign as g — a. The change 
in q, and therefore in g — a, has the same sign as gq — a; gq — a in- 
creases in absolute value from its initial value of qo — a. As pre- 
viously, we note that q tends toward zero if qo < a, and q tends 
toward | if qo > a. One of the genes is still eliminated, but this time 
which gene is eliminated depends on the initial frequency. 

(b) If w < 0, 6(g) is always opposite in sign to gq — a. We note 
again that the difference r = gq — a decreases in absolute value and 
tends to zero. In the asymptotic distribution, the two genes a and A 
coexist with the stationary frequencies a and 1 — a. It is easy to see 
that this is so in the important, specific situation when s = 0, o < 0, 
h > 1, there is exclusively zygotic selection, and the heterozygote is 
Superior in viability to either homozygote, provided consanguinity 
is not too high. In fact, we have w<0, and a= —t/w= 
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(a+; ++ 4) yl (2h — 1) is positive but not less than 1 except if 
Toy < (i= 1), which makes necessary that 1/(1 — d) < h, that 
is, A < 1 — I/h. 


REMARK. 
We can easily verify that the case u = v = 0 of (C) appears as a special 
case in the graphic discussion of (A) or of (B), if we consider the 
curve C to have degenerated into the broken line defined by (g = 0, 
p< 0:0 44 <2, vy [Orga ys 0S 

It follows that if u and v are small with respect to ¢ and w but not 
equal to zero (dotted line), the discussion will be the same as in (©), 
the only difference being that elimination and fixation will be replaced 
by an asymptotic equilibrium corresponding to a frequency of J, 
close to 0 or 1. 


3.2.2 The Case of a Finite Population 


Let N be the number of individuals in each generation. If q is 
the frequency of a in F,, we have seen that the probability of a in 
F 41 will be g + 6(q), 6(q) being represented (as a first approxima- 
tion) by a third-degree polynomial. But the frequency, q:, of a in 
F,,,1 will differ from the probability, g + 6(q), because this frequency 
is a random variable for which qg + 6(q) represents only the mean 
value. When the law of probability of g; is known as a function of q, 
e.g., 0(q, Gi) dq, the frequencies of a in successive generations appear 
as random variables in the simple Markov chain whose law of 
transition is 6(qg, g:) dqi, which is assumed to be independent of the 
rank, n, of the generation considered, as is possible if N is constant. 
If we assume that the transition in one generation, or in a certain 
number of generations, of any frequency, g, to any other frequency, 


*C will be met at only one point by the straight line D if a >1 ora <0 
(case 1) or if 0 < a < 1 and w < 0 (case 2b), but in three points if 0 < a < 1 
and w > 0 (case 2a). 
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qi, is possible, then 0(g, q:) is always greater than zero. This assump- 
tion implies that the rates of mutation uw and v are not equal to zero, 
because otherwise we could not pass from gq = 0 or g = | to different 
values. Markov’s theorem indicates then that the a priori law of 
probability, ¢,(q) dq, of the frequency of q in the generation F,, tends 
toward a limit law, ¢(q) dg, which is independent of the initial value 
of g, when n tends toward infinity. 

It is possible to formulate these laws in terms of certain hypotheses 
concerning the law of transition, 6(g, q:) dqi, which is the law of 
probability of g, when q is fixed. Let us assume it to be a form of 
Gauss’s law with mean value q + 6(q), 6(q) being small and such that 
6(0) > O and 6(1) < 0, and with a small variance, o? = w(q) > 0, 
being equal to zero only for g = O andg = 1. 

Let us assume, for instance, that the 2N gametes which produce 
the F,,, are taken at random from an infinitely large number of 
gametes produced by F,, and have essentially the frequencies g and 
(1 — q) for a and A. We know that the law of probability of the 
frequency of a in F,,; will be practically Gaussian, and that the 
conditional variance of this frequency with respect to its mean value 
will be «2 = w(q) = qU — q)/2N, which does not quite become zero 
except if g = 0 org = 1. 

In general, if because of systematic aa in F, the 
N zygotes of F,4, are each taken at random with the conditional 
probabilities P = p(p + dq), 20 = 2pq(1 — d), and R = qq + dp) 
for the three states AA, Aa, and aa, \ being the conditional inbreed- 
ing coefficient of F,41, the conditional variance of the frequency q 
of a in F,,4; with respect to its mathematical expectation g will be 
o? = ql — gl + A)/2N. 


A. Fundamental Equation. In the transition from generation F,, 
to F41, the a priori law of probability of the frequency changes from 
n(q) dq to 

nss(qu) dg = das f, bn(QO(4, 41) dg. 
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If we call M; and M; the moments of the a priori law of probability 
in F,, and in F,41, we have: 


M; = f q'bn(q) aq; 

M; = ile DPni(G) Aq 
= fi | fp 9160, 40 da: | &(@) dg 
=f) u@enQeq 


(by inverting the integrations, which is legitimate for functions that 
are bounded and can be integrated within finite intervals). 

If ui(q) are the moments of Gauss’s law, 6(q, q:1) dqi, whose mean 
and variance are q + 6(q) and w(q), respectively, and if 6 and w are 
small, these moments are calculated by developing the characteristic 
function according to the powers of its variable r: 


exp [(g + 6) + wr?/2] = 1+(q¢4+ 67+ wr?/2!4+... 
+ [q@t+ 6)r + wr?/2)*/i! +... 
We see that, by disregarding the terms in w? and 6°, 
pi/il = q + 6)*/i! + @— D@ + 8) ?w/2G — I)! + OW’), 
and 


pi = gi + tog + ED yogis + (m2) + 0662) + 00W8); 


therefore, the variance of the moments from one generation to the 
next is 


Nie Me i pd@i= zeae 


” 


fe 6(q)q* 'on(q) dq ik Ca). w(q)g* *onlq) dq. 


If we assume that 6(g) and w(q) can be represented by polynomials, 
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verified exactly by the specific forms which we have indicated, we 
shall write: 


6q) = 2 Alq'; w@)= 2 Bid. 
l>0 l>0 


By comparing the small variance, M; — Mj, to a derivative dM;/dt 
(time, ¢, being measured in generations), equation (3.2.1) is trans- 
formed to a differential system for the moments: 


an Vea pie  e 7 DRMin  . 22) 


This system cannot be solved directly, because in the second part 
of the equation there are moments of higher order than in the first; 
it enables us, however, to obtain a partial derivative equation for the 
characteristic function (or Laplace transformation) of the proba- 
bility law ¢(q, t) dq,* for which the moments are M(t). In fact, this 
transformation is 


F(s, t) = i} * egg, thdg= 2% M,(t)s?/p!, 
0 p>0 


with derivative 


oF 
gk = Mose */(p — kyl; 
these functions always exist since we integrate only between 0 and 1. 
By multiplying equation (3.2.2) by s*!/i! and summing over ij from 
0 to +, we obtain 

1 oF OF Si OF 


-— = YA) ast * 2 2B ae (3.2.3) 


Following the Laplace transformation, by setting 


F(s, t) = £[o(g, 1), 


* This function should be integrable when 0 < q < 1 for all values of 1. It 
will be evident from equations (3.2.6) and (3.2.10) that the condition has to be 
supposed true only for ¢ = 0, provided u > O and v > 0. 
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we have 
OF [ae 
af a EI 
OF ; 
as? " etq'b(q, t) dq = L\q'e(q, 1)], 
Lor 1 fs fee ty Ce Eg 
soca if en Bg = 1 | ea ee 
1 
0 
by setting 


V 


qd do rs) qd 


and noting that V = 0 for g = 0 and for gq = 1. 

Since two functions for which the Laplace transformations are 
the same are identical almost everywhere, we obtain from equa- 
tion: (3:2.3); 


eye | 5M, 
7 | | (9, 1) ay | = 2Aq'oq, 1) — 1/2 [ZBg'eG, 0); 
? q 


that is, 


z LT, “040 da = (1/2) = [wo 0] — 9G... | G.24) 
p q 


Such is the fundamental equation. 


B. Asymptotic Probability Law. If we consider $(q) dq the law 
of asymptotic probability for infinite t, then, according to Markov’s 
theory, the law of stationary probability, verifying (3.2.4), will be 


a by iwi ea) 210) (3.2.5) 
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It is, therefore, the law for which the probability density is 


2 5@) q 
oq) = [K/wg le J #@™. 3.2.5") 


In particular, when 
w= ql — @)/2N 


and 


6(q)/q0l — g) = —u/A — 9g)+ (b/g) + t+ wa, 
we have 


(q) es Kigt!’“(1 ie q)tNu—1e2N (wa? 2ta)da, (3.2.6) 


with K; determined in such a way that the integral between 0 and 1 
is equal to 1. 

This formula, given by Wright [22, 23, 24] for specific cases but 
without general demonstration, represents the probability that, in 
a limited population of N individuals, a gene a, with given coeffi- 
cients of mutation, migration, and selection, after an infinitely large 
number of generations, has a frequency between g and q + dq. It 
also represents, therefore, the law of asymptotic distribution of 
gene a, after an infinitely long time in an infinitely large number of 
populations of the same size N, and in which all the coefficients 
would be the same. Let us indicate some specific cases. 

(1) If u = 0, or v = 0, K; is by necessity zero, since the integral 
between 0 and 1 of 1/g or of 1/(1 — q) is infinite. This result indi- 
cates that, eventually, genes not affected by mutation or migration 
will certainly be either eliminated or fixed. 

(2) If 4Nu and 4Nz are less than 1, i.e., if the population size is 
large enough, and the mutation or migration rates are not too low, 
¢o(q) = 0 for gq = 0 and q = 1, and is represented by a bell- or 
double-bell-shaped curve (Figure 7) with one or more dominant q 
given by the equation e = 0, that is, 4N6(qi1) + 2g, — 1 = 0, which, 
for a very large N, becomes 6(q) = 0; i.e., it gives again the same 
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(9) 


O n | q FIGURE 7. 


equation or the asymptotic values g in a very large population that 
were studied in §3.2.1. 

Simple results are obtained by assuming that there is only one 
dominant gq, and that g remains close to it with a probability not 
far from 1; as a first approximation, let us replace 6(qg) by a linear 
function of q, i.e., 


6(q) = —kq — 4), 


q being, by necessity, the asymptotic value in a very large popula- 
tion, as in §3.2.1(A), 3.2.1(C.1), or 3.2.1(C.2.b), and k being equal 
to —6’(q) and therefore being on the order of magnitude of the 
largest of the numbers u, v, w, t, according to Taylor’s formula. 
Formula (3.2.7) would be exact if 6(q) was linear, i.e., if there was 
no Selection, as in §3.2.1(A.1). The dominant q, is given, then, by 


from which 


_4Nkg-1 
a ANE SD 


and if 4Nk is large, the dominant coincides essentially with the 
asymptotic value in an infinite population. The asymptotic distribu- 
tion (3.2.5’) is written 


6(q) = KigtN*i-1(1 — g)tNea-9-1 


with 
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K, = B(4kp, 4kq),* 
and its moments are given by formula (3.2.2), which can be written 


b= ami ei + ; = (MU ay. 


If we start with My = 1, M, = @, and 
1 
Se EY ae eet ees 
2k(M2 — qG?) aN (q — M.), 


from which we can deduce (c? being the variance) that 


ao. Ol = ql —g 
o = M,—q@ = ene wl S® 
if 4Nk is large, i.e., if the order of magnitude of k is greater than 
that of 1/N; o? is then small, the distribution is concentrated, and 
it is legitimate to admit that q varies, practically, in an interval of 
limited range and that 6(q) is linear. 

For a large number of populations under the same conditions, 
all having the same size, N, sufficiently large for 4Nk to be large, 
the asymptotic frequencies observed will almost all be grouped 
around the value that corresponds to an infinite population. The 
experimental estimate of the variance of these frequencies will 
enable us to determine k, if we know N, and the variance of g gives 
v/(u + v) or —(v/t) or [h + (/1) — AJ/(2h — 1). 

(3) If 4Nu and 4Nv are less than one, ¢(q) is infinite for g = 0 
and gq = 1 and is represented by a U-shaped curve (see Figure 8). 
The smaller u and v get, the smaller K; becomes. It is the frequencies 
close to g = 0 and g = 1 that have by far the highest probability. 
Most of the genes are approaching fixation or elimination, and the 
only thing that stops the approach is recurring mutations or renewed 
migration. There is a basic difference, then, between the case of a 
population that is very small or that has very low rates of mutation 


* B stands for the Eulerian integral. 
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¢(q) 


O 1 9  FicurRe 8. 


and migration, and whose genes tend toward fixation or elimination, 
and the case of a large population with each gene almost stabilized 
around a determined frequency. 


C. Evolution of the Probability Law over Time. In verification of 
equation (3.2.4), let us call o(g, ft) and #(q, t) = i! : ¢(q, t) dq the law 
of elementary probability and the integral at time ¢, respectively; 
let us call ¢(g) and ®(q) = i * $(q) dq the asymptotic law, deduced 


from (3.2.5); we designate by R(q, t) = ®(q, t) — &(g) the difference 
between the integral law and the asymptotic law at instant ¢. This 
difference is given for the initial instant as R(q, 0) = Ro(q); it satisfies 
conditions at the limits R(O, t) = R(1, t) = 0 and it verifies, evi- 
dently, the equation [obtained while deriving (3.2.5) from (3.2.4)] 
oR 1a OR OR 
oR elm 2] - am (3.2.7 
The difference will be determined, therefore, by obtaining the solu- 
tions of (3.2.7) which become zero for g = 0 and qg = 1 and are of 
the form R = K(q)-L(t). These solutions must satisfy 
Be ee eo 
L(t) 2 K@) 2 K(q) 


for which it is necessary that 


L(t) = e*, (3.2.8) 
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and 


Mt / 
we af. (5 =< i) Ko JNK =.0, (3.2.9) 
with K(0) = K(1) = 0. 

The last equation cannot be satisfied with these conditions at the 
limits unless the constant \ belongs to the series of “proper values,” 
\;, which are real, positive, and, presumably, arranged in increasing 
order of magnitude. By calling K,(g) the “proper solution” corre- 
sponding to X,, any series 


RG, t) = 2 Aie~ Ki) 


satisfies, simultaneously, both (3.2.7) and the conditions at the limits. 
In addition, it satisfies the initial condition R(q, 0) = R(q) if the 
coefficients A; are chosen so that 2 A;K.(q) = RQ), i.e., if they are 


given by the expansion of the function R,(q) in series of functions 
K,(q). We know that such an expansion is possible for a function Ro(q) 
which is continuous and equal to zero at the limits g = Oandg = 1. 
To express the expansion, it suffices to write equation (3.2.9) in the 
reduced form 

is) ee EK 

dr? weg)” 


designating the new variable, | ‘ $(q) dg, by r, which is the function 


of total probability &(q). We know, then, that the proper solutions 
Kr) are orthogonal (and can be taken to be normalized) with 
respect to the function 1/w@*(q), i.e., that 


mae eh eee 

Kir) Ki(r) 

ar = 0, 
I wor) 

or, by going back to the variable q, 


* K(gQ)Kiq) 1 _ 
I, wo(q) tee 
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where 
[K.q)]? 
" wag) “4 = 


The coefficients A; of the expansion of R)(q) are, therefore, of the 
form 


_ [ROK 
i iF wag)“ 


which gives the solution to the problem as 
RQ, t) = > Ae-™Kiq), (3.2.10) 
i=1 


which is a uniformly converging series. We note that the magnitude 
of the decrease of the difference R(q, t) between the asymptotic law 
at instant ¢ and the integral law is on the order e~™*, d, being the 
first proper value, unless in the function K,(q) the initial deviation, 
R(q) is not orthogonal to 1/w¢(q). The rate of the process is thus 
characterized. 

It is easy to resolve the problem completely in the case pre- 
viously studied, where 6(q) can be replaced by the linear function 
6(q) = —k(q — q). Then equation (3.2.9), where w = q(1 — q)/2N 
becomes Gauss’s equation 


gl — q)K” + [1 — 2qg + 4Nkq — @)|K’ + 4N\K = 0. (3.2.9") 
The Gaussian parameters here are a and 8, the roots of 
a? + (4Nk — lha — 4NX = 0 and y= 1—4Nkq (3.2.11) 


Calling F(a, 8, y;q) the hypergeometric series, the general solu- 
tion of (3.2.9’) is 


CiF(a, B,¥3q) + Coqt® F(a’, B’, v's 9), 
where 


e =a l= 7, B= Boel — 4, ye 2, 
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The solutions that equal zero for g = 0 correspond to C; = 0. 
There will be, therefore, “proper solutions” becoming equal to zero 
either when g = 0 or when g = 1, provided that F(a’, 6’, y’; 1), 
which, according to Gauss’s theory of equations, is equal to 
Py’ (y’ — a’ — 6’) 
Doi ot Gy’ — 8) 
a whole number, n 2 1, 1.e., that equation (3.2.11) have a whole, 
positive root n which gives for \ the “proper values” \,, = n?/4N + 
n(k — 1/4N), values that increase from k to +. 

The corresponding proper standardized solutions are the hyper- 
geometric functions 


K,(q) = h,F(n + 4Nkq, 1 — n — 4Nk + 4Nkq, 1 + 4Nkq; q). 


equals 0; this requires that a or 8 be equal to 


The constants h,, are chosen to give 


i 7 [K.(q)]? Pea 


gtNe(1 — gytNka—4%) 


The coefficients A, are given by 


1 
R(QKxQ) 
A, = ‘| iN] — g)tNea-9 dq. 


The difference is given by the formula (3.2.10). 

Since \; = k, the order of magnitude of the decrease of this 
difference will be, in general, that of e~**; the number f¢ of genera- 
tions needed to approach the state of asymptotic equilibrium appre- 
ciably, therefore, will be on the order of magnitude of 1/k. We have 
seen [§3.2.1(A)] that when 6(g) has the general form derived at the 
end of §3.2(E), but the distribution remains, over time, sufficiently 
concentrated around the value g, we take 


k= —0@ =u+v-—( — 29)t — wgQ — 39); 


k is, then, on the order of magnitude of the largest (in absolute value) 
of the quantities u, v, t, w. When all these quantities are small, 1/k is 
large, and the number of generations needed to approach equilibrium 
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is considerable. We cannot assume, therefore, that a natural popula- 
tion has reached the state of equilibrium unless conditions have 
remained the same during a very long period of time. 

The preceding method does not apply any longer in cases where 
there are neither mutations nor migrations, i.e., when u = v = 0, 
because then K = 0 and the density of asymptotic probability, 
$(q) dg, equals zero at any point between 0 and 1. All probability is 
concentrated at the two extremes, gq = 0 and g = 1. The manner in 
which this asymptotic state is reached can be studied by a different 
method [11]. 


3.3 INFLUENCE OF MIGRATION 


The hypothesis by which Wright [22, 23, 24] explains the effects of 
migration would apply well only to an island population receiving 
migrants from a large continental population with constant composi- 
tion. A scheme closer to the actual situation, which takes into 
account the interaction of one group with another by migration, 
would be the following. Let a population be distributed over an 
area A with a density 6(P) at point P with coordinates (x, y). Let us 
assume that each individual, from the time of birth to the reproduc- 
tive stage, has a known probability, f(P, Q) dSe, of migrating from 
the point P to an elementary area, dSg, centered at point Q 


( J i F(P, Q) dSq = 1). According to Bayes’s formula, each parent 


of an individual born at point Q will have the known probability, 


a(P, Q) dSr = aP)S(P, O) dSp/' |, «PSP, Q) aSp, 


of being born in an area dSp centered around point P 


(ff, s, Q) dSp = 1). 


Let X,(C) be the random variable representing the state of a 
Mendelian locus in an individual of the nth generation born at 
point C. A priori, X,(C) will take the values 1 or 0, corresponding 
to the allelic states a or A, with a priori probabilities g and p = | — q, 
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depending on the point C and the rank n of the generation; the 
X,8 relative to two different points C will have a stochastic rela- 
tion.* The random variables X,,,:(D) relative to the following gener- 
ation will have conditional probabilities well-determined on the basis 
of the X,(C) values. According to the theory of Markov chains, it 
follows that the a priori probabilities of the X,(C)s and their rela- 
tionships will tend eventually toward a stationary state, independent 
of the rank, n, of the generation. It is this stationary state we propose 
to study. 

If uw and v are the probabilities of mutation of a into A and of 
A into a in each generation, the conditional expectation of the 
random variable X’ relative to a locus of an offspring of a specified 
parent will be 


m(X’) = (1 — w)X + o(1 — X), 


X being the specified value of the random variable attached to the 
corresponding locus in the parent. This can be written 


gu(X’) = (1 — k)X + ke, 


calling c the quantity g = v/(u + v) and k the quantity v + u corre- 
sponding to the mutation pressure.f Since there is no stochastic 
relation among children other than the one resulting from the even- 
tual relation among their parents, the joint moments 9W[_.X’(C)X’(D)] 


* If the coefficient of coancestry between individuals located in places C and D 
is called ¢,(C, D), the random variables X,(C) and X,(D) have an a priori 
probability of ¢, of being identical and a probability of 1 — ¢, of being sto- 
chastically independent; this gives, as the value of their a priori correlation 
coefficient, ¢,(C, D). The asymptotic value, ¢(C, D), of this coefficient will be 
calculated further; it is useful to know that it is the same as the coefficient of 
coancestry between C and D. : 

{If there is a constant selection pressure in favor of the heterozygote, we 
know that it will be expressed, approximately, for a large number of individuals 
by a formula of the same form (k and c naturally having other values). The 
calculations we shall perform will, therefore, be a first approximation applicable 
to constant selection, but they exclude, naturally, the “geographic selection” 
that depends on location. 
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of a certain number of random variables X’ of the (n + 1)th genera- 
tion will be linear combinations of the products of the variables X 
of the parents, if the latter are known; if they are unknown, the 
X’ will be linear combinations of the mathematical expectations of 
these products, i.e., of the joint moments of the nth generation. By 
equating the joint moments of the two generations, we shall obtain 
linear integral equations for determining these moments. We shall 
indicate only the calculations for the moments of orders 1 and 2. 
The mathematical expectation, 9%(Q), of X(Q) will be given by 


m{X(Q)] = ff, (1 — WmLxXP)y] + ke} gP, Q) dS, 
that is, 
am(Q) = ff, (d — Wan(P)g(P, Q) dSr + ke, 
an equation whose only solution, if k = (u-+ v) > 0, is 


v 
uv 


SCP). = constant, =¢ = 


The mathematical expectation is therefore independent of the geo- 

graphical position. In the calculations that follow X —c= Y, 

ym Y) = 0, and from one generation to the next M(Y’) = (1 — k)¥Y. 
The variance of X, or of Y, will be 


= OY) = Mee): =e? = cll =e): 


The joint first moment of the two random variables Y(C) and Y(D) 
of the same generation will be designated by mm[Y(C)Y(D)] = 
s’o(C, D); $(C, D) is both coefficient of coancestry and a priori 
correlation coefficient of these two random variables and also of 
X(C) and X(D).* Let us call (C, C) its limit, obviously less than 
one, when D gets infinitely close to C, the two loci remaining 


* Also of the local frequencies gc and gp in places C and D, because these are 
local arithmetic means of such random variables. 
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distinct (they may be, in case of random mating, the two homologous 
loci of the same individual). 

Two loci, Ynii(C) and Ynii(D), of two individuals in the 
(n + 1)th generation born in C and D will have the probability 
g(E, C)g(F, D) dSz dSy of coming from parents born in E and F 
and the probability g(E, C)g(F, D) dS of coming from parents both 
born in the same neighborhood of a single site E; in the latter case 
they will have the conditional probability 1/[26(£) dSz| of coming 
from the same locus of the same parent and the probability 
1 — 1/[26(E) dSz| of coming from loci infinitely close but distinct.* 
We have, therefore, when the places of birth, E and F, of the parents 
are known (conditional expectation), 


Mal Yn41(C) Yn4s(D)] = 1 — ky YE) YAP); 
and when they are unknown (a priori expectation), 
ML Yn4a(C) Yn4s(D)] 
= IM {Mal Ynti(C) Yn4i(D)]} 
== ff [f, mlY. YE, Cg(F, D) Se dSr- 


om Y,.(E) Y,(F)| should be taken as equal to s?¢,(E, F) if the elements 
of area dS; and dSp are distinct; if they are not distinct, 
wm Y,CE) Y,(F)| should be taken as equal to 

Mt Y,(E)? 


IKE) dss t [1 — 1/26(E) dSz]s°,(E, E) 


that is, equal to 


, s*[1 — $,(E, E)], 


dividing by s? we have the “Fredholm iteration’’: 


* Formula for monoecious random mating; in case of separate sexes, 5(£) is 
twice the harmonic mean of male and female densities in E. 
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dnix(C, D) = (1 — KP ff [f, s(E, F)a(E, C)g(F, D) dSs dSp 


“Al hy | ie oie ®) 9(E, Ce(E, D) dSe. 
(3.3.1) 


In the stationary state, if ¢(E, E) = lim ¢,(E, E) was a known func- 
tion, (3.3.1) would be, for the unknown function ¢(C, D) = 
lim ¢,4:(C, D), a Fredholm equation with an integrable kernel of 
norm (1 — k)? < 1 (if k > 0); it would then have a unique solution 
given, whatever the initial values, by the same integration as for 
zero initial values: 


#(C, D) = i) = ar De zal — Wied, CgalE, D) dS, 
(3.3.2) 
by setting 
gE, C) = | ff, s(E, F)g(F, C) dSp, 
gE, C) = ff gna(E, F)g(F, C) dp. 


By taking E = C, we obtain a second Fredholm equation for the 
determination of ¢(E, E): 


HC, C) = i) ik ee De: ie (1 — k)"+2gi(E, C)dSz. (3.3.3) 


This equation in general (when its kernel is integrable and of norm 
< 1) has a single solution, ¢(E, E); by putting it into (3.3.2), we 
obtain ¢(C, D). 


REMARK I. A Partiat DiFFERENTIAL EQUATION 
APPROXIMATING (3.3.1). 
We may introduce the moments of the migration law, i.e., 


Mg = if (xz — xc)"(ve — yo)'(E, C) dSe, 


by replacing, in the second term of (3.3.1), $,(E, F) by its Taylor 
development, 
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$.(E, F) = $,(C, D) + (CE-Ve + DF-Vr)bx(C, D) +... 
HH i (CHV, EDF Ved, D) +... 


using the symbol CE-Vo for the operator 


0 0 
(xz — Xc) ais + (ye — ye) ae 


it being defined ies aes ‘] 
tS powers being defined as usua Es (=~) (~) > 


It is now easy to express the double area integral in the second term 
of (3.3.1) as a function of the partial derivatives of ¢,(C, D), the 
coefficients being the moments, m,,, calculated from place C, and the 
similar moments, m/,,, calculated from place D; the beginning of this 
integral is (considering a symmetrical case, for the sake of simplicity, 
because the odd moments are then equal to zero): 


OAC, D) a ; | mn 5 g “Pn + 2muy ae Pn Seas: ase 


mi ant 32 a eae i) 
XD OYD 

(The formula for unidimensional or tridimensional cases is naturally 
of the same form.) 

If the moments and their products are negligible from some order, 
and if we replace ¢,(C, D) and ¢,4:(C, D) by their equilibrium expres- 
sion, @(C, D), this last function is a solution (which tends to zero when 
distance CD tends to infinity) of a linear partial differential equation, of 
1 — $(£, E) 

26(E) 
itself tends to zero when CD tends to infinity. 


which the nonhomogeneous term if | g(E, C)g(E, D) dSz 
A 


3.3.1 Special Case of ‘‘Homogeneous 
and Isotropic’ Migration 


Let us suppose that the area occupied by the population can be 
considered unlimited, that the density 6(£) is constant (in space and 
time) and that f(P, Q) depends only on the distance PQ = r; then 
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g(P, Q) is equal to f(P, Q). Let us set g(P, Q) = g(r), so that it 
becomes a function of a single variable, no longer of four; similarly 
gAP, Q) = g,(r). From (3.3.3) we get 


KC Cy= i {} HS *) & (1 — Wi gi(EC) dS, 


an integral equation whose solution by successive approximations 
gives ¢(C, C) = constant = @p. It follows that 


ep 
PSE ara 
from which 
Ley tlhnts ; 
do = SL (33353') 
where 


he J i 2d — kprt2g2(") dS. 


This is the value of the correlation coefficient between two closely 
located loci. 

Equation (3.3.2) shows that ¢(C, D) depends only on the dis- 
tance CD. Let @ = ¢(CD); then 


$(CD) = 3 (1 — k+¢,(EC)g,(ED) dSz. (3.3.4) 
A n=0 


26 


This is the correlation coefficient between two loci whose distance 
is CD.* We can express, in algebraic terms, the “products of com- 
position” (or “‘convolutions”) which appear in (3.3.4) by considering 
the Fourier transforms, 


F(u, v) = it | eiurtivug(V/ x? + y?) dx dy 


and 


* Or the coefficient of coancestry between C and D (see p. 65”). For random 
mating, ¢(C, C) = ¢o (coancestry between closely located loci) is also the inbreed- 
ing coefficient (coancestry between the two homologous loci of one individual). 
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K(u, 0) = ff emerinug(v/x2 + y*) dx dy, 


because we have 


K(u, 0) 52 z ce kyr | 
= 


1 — do 
26 ( 


I 


ty kyr, 


a formula which is also obtained by applying the Fourier transform 
directly to formula (3.3.1). 

Thus K is expressed as a function of F(u,v), which is known. 
From this, by inversion of the Fourier transform with two variables, 
we have 


Siaeay eee” 
= ah fe Leap? = = hye du dv. (3.3)5) 


(By setting x = y = 0, we find again the linear equation for qo.) 

These calculations can be carried still further by assuming that 
the displacement of each individual is a random movement following 
the scheme of Polya, i.e., that the law f(r) dS = g(r) dS is an iso- 
tropic normal law, 


G(o") dS = (1/2m0?)e—"/20? dS, 
which gives 
F(u, v) = e~ 7/2) (w+), 


Ku, v”) = cams 5% e z (1 _— k)??e- (2po?,'2) (u?+-v?) | 
Di 


From this we deduce series (3.3.4), which is easier to calculate than 
(3.3.5), and from which 


yee: 5 $0 2d — k)°G(2po?). (3.3.4) 


This series is uniformly convergent when k > 0, since 
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2 
G(2po") < ico 


Formula (3.3.3) can be found here by making r = 0, which leads 
us to calculate 


ne 2 (1 — k)*[1/4xpe?] 


[1/4ro%] fo *” | 2, | dx 
Jog [1 = 1 = B)/4ne? = Slog Ok 89/4008, 


from which 
goo = 1/[1 — 820?5/log (2k — k?)]. (3:3.3!") 


We can calculate po easily, from the pressure k (of overdominance 
or of mutation) and from the number zo*6 of individuals in a circle 
of radius o, in which resides, on the average, 40 per cent of the 
individuals born at its center); the smaller these two quantities are, 
the closer go is to 1 (local quasihomogeneity); next, we deduce 
from (3.3.4’), 


(3.3.4'’) 


és 3 WY (5) 
=] 


which shows that the correlation to the distance r decreases from ¢o 
to 0 when r increases from 0 to ©. The numerical value of this ratio 
depends only on two quantities, k and r/c; it is, therefore, easy to 
set up tables that will enable us to interpret the experimental results 
with the help of this formula. 


REMARK II. 
To calculate (3.3.4) numerically, we can develop the numerator 
according to the powers of r?, arriving at the series 


= (1 — b)?/p, 
p=l1 
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whose sum is 


GQ—k log [1 — k2/X]-1 dX . 
: Ga)! ites 


we also deduce from this that the numerator of (3.3.4’’) is equal to 


1 log (1— (1—k)?) r (1 — k)? 
-ia |, Ho o log Let ic 


Jy being the Bessel function. By letting r = 0, we find again the de- 
nominator H. 


REMARK III. 

If k tends toward zero, the numerator and the denominator of (3.3.4’’) 
tend toward infinity, but their difference remains finite (according to 
the properties of Jo); therefore, H —> ©, ¢@o and @¢ — 1; and the 
population tends toward complete homogeneity, which is inevitable 
in any population with a finite size in the absence of mutations. 


REMARK IV. 


We may, in the partial differential equation shown to approximate 


(3.3.1), when o is small with respect to rV k, keep only the second 
moments 719 = My = Mio = Mo = 072, My = mi, = 0 (the higher 
moments, being higher powers of o, give negligible characteristic roots) ; 


r being large with respect to o, i g(E, C)g(E, D) is negligible and 


o(C, D) = g(r) is a solution, null at infinity, of the homogeneous 
Helmholtz equation 


$(r) = 1 — 16) + 0? AGO), 


2 2 
a ee oY which, in polar coordinates r 


as | a | 196. 
and 6, is, like $(r), independent of @ and equal to ap? rap? 8° 


Ad being the “Laplacian 


(when neglecting k?) we obtain the Bessel equation 


O% 106 2%. _ 

Or Conon, xotci © 
Of the two distinct solutions, J) and Ko, only Ky is bounded, thus 
giving the correlation (or coefficient of coancestry): 


74 Evolution of a Mendelian Population 


r 


where a is a constant and r is much greater than co. 

The same equation, and the same result, is true for every migration 
law all of whose reduced moments are bounded [15], and the Helm- 
holtz equation is valid for an isotropic migration of any dimensionality. 

; vr ; 0? 2 : ; 
So, in unidimensional cases, “¢ — os = 0 gives an exponential 
decrease proportional to exp —V 2kr/o. This exponential decrease 
has also been found in discontinuous cases [13, 15]. Weiss and Kimura 


[25] extended the formulas to the tridimensional case; Ad — a ¢=0 


. 20¢ 2k sles : 
SIVES oxy it Sia. ee 0, giving a decrease proportional to 
~ © exp — i 2kr/o. 


In all these formulas, o is the standard deviation of the migration 
along each axis of coordinates (migration may not be normal); the 
correlation ¢(r) with large distance r depends only on the ratio r/o 
and on the rate k. 


These asymptotic formulas (3.3.4 and its varieties) are independent 
of density, 6, so we can use them in a population whose density 
varies considerably over the years (as with Chetverikov’s waves of 
vitality). 

If the individuals show a tendency to stay grouped in “‘colonies”’ 
or ‘‘swarms,”’ we shall take that into account by postulating that each 
individual has respective probabilities a and (1 — a) of making an 
infinitely small displacement with variance e? or of making a migra- 
tion to a distant point with variance o?, i.e., by taking g(r) = 
aG(e?) + (1 — a)G(c?), which gives 


F(u, ») = ave — (2/2) (ware?) (= ae (7/2) (ur+v2) , 


from which we have a formula for ¢(7r), i.e., the same asymptotic 
expression, but with variance ae? + (1 — a)o”. 
The experimental determination of the correlation as a function 
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of distance, for verification of this theory, can be done in several 
different ways. 

(1) We can measure the frequency, g:, of a Mendelian gene (with- 
out geographic selection) at a large number of points, P;, of a wide 
territory; we shall take the general mean of these frequencies as an 
Gi — YG; — ©), 

c(1 — c) 
calculated from two points, P; and P;, whose distance is r, as an 
estimate of ¢(r) and of its decrease when r increases. This verification 
was made by Lamotte on Cepaea nemoralis {10, 16]. 

(2) We can measure on different individuals a biometrical trait, 
neutral for selection, whose intensity can be considered the additive 
effect of a certain number of independent Mendelian random var- 
iables, X; with expectations M; (each X; assuming values s; and f; 
with probabilities g; and p; as functions of the location); the mean 
correlation between two individuals, J and J’, situated at a distance r 
will be an estimate of 

mz xX; — M)x(xi — M,)] _ zm[(X; — M)(Xi — MI] 
m[zX; — M,)]? D(X; — Mi)? ' 
that is, will be an estimate of (7) if we postulate that the rate of 
mutation, k, is the same for all the genes concerned.* We must 
remember, however, that the correlation decreases if a fraction of 
the variability is not genetic (we then have to multiply by the 
“heritability’’). 


estimate of c, and the mean of all the quantities 


3.3.2 Other Applications 


(A) Panmixia in a finite isolated population of N individuals can 
be studied by assuming that the occupied area, A, is equal to 1, 


* This is only approximate. If we consider the fact that the two random vari- 
ables X; of each individual have, with random mating (see p. 70), a coefficient 
of inbreeding equal to ¢o, the numerator and denominator are 4¢(r)2'(X; — M;)? 
and 2(1 + ¢o)2'(X; — M;)?, the summation 2’ being now extended only to 
nonhomologous loci, from which the correlation [13] between the genetic com- 

29(r) 

1 + ¢(0) 


ponents of metrical traits (without dominance nor epistasis) is 
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6 is equal to N, and g(P, Q) is equal either to 1, if P and Q are 
within A, or to 0, if they are outside. We have, then, for Cand Din A, 


#(C, D) = HC, €) = $24) Fa — im, 


which gives 


(1. =k) 
IN(— CA — Ha — ke 


By equating this expression to ¢o given by (3.3.3) or (3.3.3’’), we 
obtain the size N of a panmictic group “equivalent”? to a group 
occupying a very small area and constituting part of a population 
with random isotropic migration. Let 


@ = constant = 


416 


DA ale? oni ie tees 


This concept of “‘equivalent effective number” introduced by Wright 
[22, 23, 24], following entirely different reasoning, does not have the 
weight that he attributes to it, because it does not account for the 
correlation with distance. 

(B) We can try to formulate a scheme of homogeneous but non- 
isotropic migration (in an unlimited population of constant density) 
by postulating that the displacement of an individual results from 
two independent displacements with different laws of probability, in 
two rectangular directions, i.e., that 


g(P, Q) dS, = mx — X2)n(yi — y2) dx dyn. 


Designating the coordinates of P and Q by 4, ji, x2, yo (m and n 
being two functions each of one variable, whose integral from — 
to + is equal to 1), and setting 


My (%1 _— X2) = ee MAX — X3)N(X3 — X2) dxs 


(and similarly for n), formula (3.3.2) becomes 
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dP, Q) = / [P — 8, | 2 alle k)??+2m,(x3 — x)m(X3 — X2) 


rh Ys — Yi)ttgl 2 — v9 | diy dys. 


In particular, if we take for g(P, Q) Gauss’s nonisotropic law, 


1 __ (a—21)? fy (y2— 1)? 
20? 2p? 
p) 


g(P, Q) = 


nen 
we find ¢(E, E) = constant = go, and we have 


x leech A ky | _(a—a)? — (e—m)? 


2po* 2 pp? 
oP, Q) = me be wi) 2p . ee ’ 


from which we calculate ¢) by making x. = x, and y, = 1. For long 
distances, we obtain a homogeneous partial differential equation (of 
elliptic type). We could introduce an analogous scheme with three 
dimensions to represent the variability of an aquatic population 
according to the two coordinates of surface and depth. The approx- 
imate partial differential equation is easy to write, since it is a 
generalization of the Helmholtz equation on p. 73. 


3.4 APPENDIX: DISCONTINUOUS MIGRATIONS 


The case of discontinuous migrations refers to the model in which 
the individuals of each generation do not inhabit a continuous area 
but rather a discrete set of places (still called A), each place being 
looked at as a point (still called C or D for two offsprings, J and J, 
taken in F,,,, and E or F for the possible places of their parents, 
P; and P;). The integrations in formula (3.3.1) have to be replaced 
by summations on the lattice A of all possible places E and F; 
g(E, C) is the probability, in place C, of an individual coming from 
place E. We have 


z g(E,C) = 1; 
ECA 
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1 es ef 
DE) dSp has to be replaced by the conditional probability for two 


parents, successively taken in place E, to be identical, 1.e., by 1/2Nz 
if there are Nz equally probable monoecious parents: 

When there are Nig and Nog equally probable fathers and mothers, 
the conditional probability is still 1/2Nz if we call Nz the double of 
the harmonic mean, 1/Nzg = 1/4(1/Mie + 1/Noz). ue ta (3.3.1) 


may now be written 
gru(C, D) =(L—ky? 2 2 onl, F)g(E, C)gh, D) 
HEA FEA 


(3.4.1) 
9 ‘ l 5 nr E, E 
+(—w? » 18h 5) oe, Crete, D). 
HEA E 


If we call ¢(C, D) some solution independent of n, the difference 

dnti(C, D) — $(C, D) = Yn4i(C, D) is related to ¢,(E, F) — o(£, F) = 

YE, F) by a recurrence now homogeneous, and sup y,,(£, F) tends 
LE 


to zero when nv goes to infinity; so ¢(C, D) (when it is supposed to 
exist) is unique, and is the limit of ¢, when n goes to infinity. Con- 
versely, a limit existing for some particular initial condition is a 
solution independent of n, and this is the same as the limit for any 
initial condition. 

So, to obtain the limit, it is sufficient to take the particular initial 
condition ¢(£, F) = 0, which gives 


HC, D)=(1— kee MED) & (1 — ye4g,(8, C)gE, D), 
zcA 2NzE n=0 
(3.4.2) 


putting 
g(E, C) = nea 8n-i(E, F)g(F, C). (3.4.2’) 


Let us now suppose that the migration is homogeneous, i.e., that 


—> 
g(E, C) depends only on the components of the vector CE, each of 
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which components may be supposed to be integers p and qg.* Let 
us put g(E, C) = u(p, q). The migration law may then be defined by 
the “‘generating function” 


G(a, 8) = a) up, g)a?B%, 


which converges absolutely when |a| = 1 and |g| = 1 (putting 
a = e and 6B = e’¥, we have a Fourier series). 
In the same manner, we put 


Gr(a, B) = re Hm(D, qa", (3.4.3) 


where um(P, J) 18 Sm(E, C) expressed as a function of the components 
—»> 
p and q of the vector CE; i.e., by introducing in (3.4.2’) the com- 


— — 
ponents p’ and q’ of FE and po and q of CF, related by p = p’ + po, 
q=q + 4: 


Gra, B) 


> Ll E, C)aP’ + Pega’ +40 
EEA 


I 


> gnalE, Fa?’ g(F, Cham. 
EEA, FCA 


Since there is absolute convergence, we may make the summations 
in any order. By summing first with respect to E, since the last three 
factors do not depend on E, we have 


G,,(a, B) = ae Gn—la, B)g(F, Car Be ; 


Gm1 may be made a factor in the summation, which then gives 
z g(F, Carer = Gla, 8), 
So we have 


Gra, 8) a, Gn=ala, B)G(a, B), 


*If the notations used for each point consist of two integers (“‘integer co- 
ordinates’’), p and qg are the excesses of the coordinates of E over the coordinates 
of G; 
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and by iteration, 


G,(a, 8) = [G(a, 8)”. 


Equation (3.4.2) for ¢(C, D) may then be transformed into an 
equation giving a “generating function” of ¢; it is sufficient to note 
that if we complete the definition of homogeneous migration by 
putting Nz = N independent of place E, ¢(E, E) = constant gives a 
solution for ¢(C, D), a solution which is known to be unique; so 
we may put ¢(£, E) = constant = d¢o.* The right-hand side of (3.4.2), 


like the left-hand side, depends only on the components of CD, 
which may be called x and y; ¢(C, D) will now be called (x, y).T 

If we multiply (3.4.2) by a7’, this amounts to multiplying each 
term of the sum 2 by a?8%a*-?BY-4, p and q being the components 


— , — oo — 
of CE, x — pand y — q being the components of CD — CE = — DE. 
We then have 


] oe is) 
a BYb(X, y) = ee es, @! — k)e2mt2 i : n 


Z a?Bgn(E, C)a*?p" gn E, D), 
E 
setting g = Qp. 


The right-hand side may, if k > 0, be summed up over all values 


of x and y (components of CD), i.e., over all points D, when 
la| = |@| = 1; it is a multiple series whose general term (indexed by 
m, E, D) has a modulus bounded by (1 — k)?"*°g,,(E, C)g,n(E, D); 
but because of homogeneity, 2 8nlE, D) is the same as 2 Ln E, D), 


and thus equal to 1; thus 2 g,,(E£, D) 2 gm(E, D) = 1, and then the 
E D 


* We know that this is the inbreeding coefficient at any place. | 
+ We keep the same name for the function of the new variables; they are 


scalars, not points, and should not be confused. 
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series is, as 2 (1 — k)?**, absolutely convergent; so we may put 
(when |a| = || = 1) 
P(a, 8) = = a8 (x, y). 
zy 


When summing up the right-hand side of (3.3.4), we may begin 
by noticing that g,,(E, D) = un(p — x,q — y), and calculating 


ip ein Dire ts ty) Gn(1/e, 1/8) = [G(1/e, 1/8)]. 


Afterwards, the summation over E gives a factor G,(a, 8) = 
[G(a, 8)|", and the summation over m gives a geometric series, 
which gives the same formula as that obtained for the Fourier 
transform in the continuous case (but extended now to nonsym- 
metrical migration), 


(1 — kyl. — ¢0)/2N]G(a, B)GU/a, 1/8). 
1 — (1 — k)’G(a, B)G(1/a, 1/8) 


In the symmetrical case, where g(E, C) = g(C, E), we have 
G(1/a, 1/8) = Ga, 8). 

But the “inversion,” i.e., the problem of going back from the 
Fourier series ®(a, 8) to its coefficients ¢(x, y) may be simpler than 
using integral formulations of these coefficients [formula (3.3.5) 
written with e*“* = a, e'” = 6 and integrated over x € (0, 27) and 
y € (0, 2r)]. For instance, let us study the unidimensional case, when 
the coefficient of coancestry, for algebraic distance x, is called (x) 
and has a generating function, (a) = 2 a*¢(x), given by 


B(a, B) = 


b] 


oye (1 — kL. = ¢o)/2NJG(a)GU/a) 
° 1 — (1 — kG(a)G(1/a) 


In most cases where G(a) is a polynomial—as in the case of migration 
between adjacent groups only, where G(a) = 1 — 2m + ma + m/a— 
we shall see that the expansion into partial fractions gives only two 
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terms with large residues, i.e., those corresponding to the two 
1 1 
solutions near | of the equation G(a)G (;) -a— These two 


solutions, a, and a2, are obtained by developing G(a) = = p(p)a? 
E 


[CE = pand u(p) = g(£, C)] into the moments m, of the migration 
law, using formulas 


Gl) = = u(p) = 1, 
E 


(5) 
da a=) 


| iv 
Ja? 1 
from which 


Ge) = 1+ me — 1) + Gm — ma — 1)?/2 + Oe — 1), 
G(1/a) = 1 — mo — 1)/a + (me, — ma — 1)2/2e? 


2 pulp) = ™, 


2 Pp — I)u(p) = m — m, 


+ Of — 1?) 
= 1— m(a — 1) + ma — 1)? + Gm — ma — 1)?/2 
+ O[(a — 1)*), 


G(a)G(/a) = 1+ Gm — mila — 1)? + O[(a — 1%), 


which introduces only (even in the nonsymmetrical case) the variance 
a? = mM, — mi; So we obtain, when k is small and when we equate 
G(a)G(1/a) to 1/11 — k)? ~ 1 + 2k, two solutions near 1, a; and a», 
given by 
a; — 1 = +V2k/o?. 
Let us recall that the expansion of #(q) into partial fractions uses 
all roots a; of its denominator* and is  — 


1a, a; 
by 


each A; being given 


* That is, a// values a; such that G(a;)G(1/a;) = 1/1 — k)? ~ 1 + 2k. 
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A; = La BAG = $0)/2NG(oi)GU/a) 
(1 = bP S [G(@)G(1/a)]e 


(where the denominator equals a;); 


oh (CU oo) / 20, 0 eS oo 
a saad 2 are : 
PORE: Nis 4No*(1 — ai) 
Oa 


(where the denominator of the lefthand fractions equals a;). 

This shows that the residues A; and A» corresponding to the two 
roots (a; and a2) near | are much larger than the others: if we suppose 
a, < 1 < ay, the only terms with negative exponents in the expansion 

Ay Ay 


: A an? 
of + —*_ will be those* of ——— = A, = afa-*, thus 
(04 Tal 10'S | (06 Soy (04) Ole CLT x2=0 


giving, for negative values —x, 


= Joke 
o 


> 


o(—x — 1) ~ Ajai ~ A — V2K/02)? ~ Axe 
with 
Ay ~ (1 — do)/[2No%(1 — a1)] ~ (1 — $0)/4NoV 2k 
(and x > 0). Similarly, we obtain positive exponents by expansion of: 
Ay 


ON Sane: 


= (—A2/az) 2 az “a? 
which gives (when x 2 0) 
$(%) ~ — Anazy =! ~ Aroy*~! ~ AL + V2K/02)-* 
and peculiarlyt 
$(0) = do ~ Ar ~ (1 — o0)/4NoV 2k, 


1 
1 + 4NoV 2k 


po 


* There may be other roots than a; of modulus < 1, but they have residues 
much smaller than A;. 

{ For x > 0, we find ¢(~) = ¢(—x); the coancestry is obviously the same for 
two opposite values, x and —.x. 
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So we obtain the general formula (the sign ~ meaning ‘only 
when k is small’’): 


_./onz 


og 


$(X) ~ 


e 
1 + 4NoV 2k 

The numerical decrease with distance is the same as in the uni- 
dimensional continuous case; it may be seen that ¢p also is the same; 
but in the two-dimensional case, ¢o is very different (depending, as 
we have seen, on log 2k and not on V 2k). 


10. 
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